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Added misaligned exception mask (MXCSR.MM) information. 

Added imm8 values with corresponding mnemonics to (V)CMPPD, 
(V)CMPPS, (V)CMPSD, and (V)CMPSS. 

Reworded CPU ID information in condition tables. 

Added minor clarifications and corrected typographical and formatting 
errors. 

September 2006 

3.08 

Made minor corrections. 

December 2005 

3.07 

Made minor editorial and formatting changes. 

January 2005 

3.06 

Added documentation on SSE3 instructions. Corrected numerous 
minor factual errors and typos. 

September 2003 

3.05 

Made numerous small factual corrections. 

April 2003 

3.04 

Made minor corrections. 
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Preface 


About This Book 

This book is part of a multivolume work entitled the AMD64 Architecture Programmer s Manual. 
The complete set includes the following volumes. 


Title 

Order No. 

Volume 1: Application Programming 

24592 

Volume 2: System Programming 

24593 

Volume 3: General-Purpose and System Instructions 

24594 

Volume 4: 128-Bit and 256-Bit Media Instructions 

26568 

Volume 5: 64-Bit Media and x87 Floating-Point Instructions 

26569 


Audience 

This volume is intended for programmers who develop application or system software. 

Organization 

Volumes 3, 4, and 5 describe the AMD64 instruction set in detail, providing mnemonic syntax, 
instruction encoding, functions, affected flags, and possible exceptions. 

The AMD64 instruction set is divided into five subsets: 

• General-purpose instructions 

• System instructions 

• Streaming SIMD Extensions (includes 128-bit and 256-bit media instructions) 

• 64-bit media instructions (MMX™) 

• x87 floating-point instructions 

Several instructions belong to, and are described identically in, multiple instruction subsets. 

This volume describes the Streaming SIMD Extensions (SSE) instruction set which includes 128-bit 
and 256-bit media instructions. SSE includes both legacy and extended forms. The index at the end 
cross-references topics within this volume. For other topics relating to the AMD64 architecture, and 
for information on instructions in other subsets, see the tables of contents and indexes of the other 
volumes. 
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Conventions and Definitions 

The section which follows, Notational Conventions, describes notational conventions used in this 
volume. The next section. Definitions, lists a number of terms used in this volume along with their 
technical definitions. Some of these definitions assume knowledge of the legacy x86 architecture. See 
“Related Documents” on page xl for further information about the legacy x86 architecture. Finally, the 
Registers section lists the registers which are a part of the system programming model. 

Notational Conventions 

Section 1.1, “Syntax and Notation” on page 2 describes notation relating specifically to instruction 
encoding. 

#GP(0) 

An instruction exception—in this example, a general-protection exception with error code of 0. 
1011b 

A binary value, in this example, a 4-bit value. 

F0EA_0B40h 

A hexadecimal value, in this example a 32-bit value. Underscore characters may be used to 
improve readability. 


128 

Numbers without an alpha suffix are decimal unless the context indicates otherwise. 


7:4 

A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Commas may be inserted 
to indicate gaps. 

#GP(0) 

A general-protection exception (#GP) with error code of 0. 

CPUID Fn XXXX_XXXX_RRR[FieldName\ 

Support for optional features or the value of an implementation-specific parameter of a processor 
can be discovered by executing the CPUID instruction on that processor. To obtain this value, 
software must execute the CPUID instruction with the function code XXXXXXXXh in EAX and 
then examine the field FieldName returned in register RRR. If the “RRR” notation is followed by 
“_xFYF”, register ECX must be set to the value YYYh before executing CPUID. When FieldName 
is not given, the entire contents of register RRR contains the desired value. When determining 
optional feature support, if the bit identified by FieldName is set to a one, the feature is supported 
on that processor. 

CR0-CR4 

A register range, from register CRO through CR4, inclusive, with the low-order register first. 
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CR4[0SXSAVE], CR4.0SXSAVE 

The OSXSAVE bit of the CR4 register. 

CR0[PE] = 1, CRO.PE = 1 

The PE bit of the CRO register has a value of 1. 

EFER[LME] = 0, EFER.LME = 0 

The LME field of the EFER register is cleared (contains a value of 0). 

DS:rSI 

The content of a memory location whose segment address is in the DS register and whose offset 
relative to that segment is in the rSI register. 

RFLAGS[13:12] 

A field within a register identified by its bit range. In this example, corresponding to the IOPL 
field. 

Definitions 

128-bit media instruction 

Instructions that operate on the various 128-bit vector data types. Supported within both the legacy 
SSE and extended SSE instruction sets. 

256-bit media instruction 

Instructions that operate on the various 256-bit vector data types. Supported within the extended 
SSE instruction set. 

64-bit media instructions 

Instructions that operate on the 64-bit vector data types. These are primarily a combination of 
MMX and 3DNow!™ instruction sets and their extensions, with some additional instructions from 
the SSE1 and SSE2 instruction sets. 

16-bit mode 

Legacy mode or compatibility mode in which a 16-bit address size is active. See legacy mode and 
compatibility mode. 

32-bit mode 

Legacy mode or compatibility mode in which a 32-bit address size is active. See legacy mode and 
compatibility mode. 

64-bit mode 

A submode of long mode. In 64-bit mode, the default address size is 64 bits and new features, such 
as register extensions, are supported for system and application software. 
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absolute 

A displacement that references the base of a code segment rather than an instruction pointer. 

See relative. 

AES 

Advance Encryption Standard (AES) algorithm acceleration instructions; part of Streaming SIMD 
Extensions (SSE). 

ASID 

Address space identifier. 

AVX 

Extension of the SSE instruction set supporting 256-bit vector (packed) operands. See Streaming 
SIMD Extensions. 

biased exponent 

The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data 
type. The bias makes the range of the biased exponent always positive, which allows reciprocation 
without overflow. 

byte 

Eight bits, 
clear, cleared 

To write the value 0 to a bit or a range of bits. See set. 
compatibility mode 

A submode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16- 
bit and 32-bit applications run without modification. 

commit 

To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a 
register (including flags), the data cache, an internal write buffer, or memory. 

CPL 

Current privilege level, 
direct 

Referencing a memory address included in the instruction syntax as an immediate operand. The 
address may be an absolute or relative address. See Indirect. 

displacement 

A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer 
(relative addressing). Same as offset. 
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doubleword 

Two words, or four bytes, or 32 bits, 
double quadword 

Eight words, or 16 bytes, or 128 bits. Also called octword. 
effective address size 

The address size for the current instruction after accounting for the default address size and any 
address-size override prefix. 

effective operand size 

The operand size for the current instruction after accounting for the default operand size and any 
operand-size override prefix. 

element 

See vector. 

exception 

An abnonnal condition that occurs as the result of instruction execution. Processor response to an 
exception depends on the type of exception. For all exceptions except SSE floating-point 
exceptions and x87 floating-point exceptions, control is transferred to a handler (or service 
routine) for that exception as defined by the exception’s vector. For floating-point exceptions 
defined by the IEEE 754 standard, there are both masked and unmasked responses. When 
unmasked, the exception handler is called, and when masked, a default response is provided 
instead of calling the handler. 

extended SSE instructions 

Enhanced set of SIMD instructions supporting 256-bit vector data types and allowing the 
specification of up to four operands. A subset of the Streaming SIMD Extensions (SSE). Includes 
the AVX, FMA, FMA4, and XOP instructions. Compare legacy SSE. 

flush 

An often ambiguous tenn meaning (1) writeback, if modified, and invalidate, as in “flush the cache 
line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.” 

FMA4 

Fused Multiply Add, four operand. Part of the extended SSE instruction set. 

FMA 

Fused Multiply Add. Part of the extended SSE instruction set. 

GDT 

Global descriptor table. 
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GIF 

Global interrupt flag. 

IDT 

Interrupt descriptor table. 

IGN 

Ignored. Value written is ignored by hardware. Value returned on a read is indeterminate. See 
reserved. 

indirect 

Referencing a memory location whose address is in a register or other memory location. The 
address may be an absolute or relative address. See direct. 

IRB 

The virtual-8086 mode interrupt-redirection bitmap. 

1ST 

The long-mode interrupt-stack table. 

IVT 

The real-address mode interrupt-vector table. 

LDT 

Local descriptor table, 
legacy x86 

The legacy x86 architecture, 
legacy mode 

An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and 
operating systems run without modification. A processor implementation of the AMD64 
architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real 
mode, protected mode, and virtual-8086 mode. 

legacy SSE instructions 

All Streaming SIMD Extensions instructions prior to AVX, XOP, and FMA4. Legacy SSE 
instructions primarily utilize operands held in XMM registers. The legacy SSE instructions 
include the original Streaming SIMD Extensions (SSE1) and the subsequent extensions SSE2, 
SSE3, SSSE3, SSE4, SSE4A, SSE4.1, and SSE4.2. See Streaming SIMD instructions. 

long mode 

An operating mode unique to the AMD64 architecture. A processor implementation of the 
AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes, 
64-bit mode and compatibility mode. 
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lsb 

Least-significant bit. 

LSB 

Least-significant byte, 
main memory 

Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular 
computer system. 

mask 

(1) A control bit that prevents the occurrence of a floating-point exception from invoking an 
exception-handling routine. (2) A field of bits used for a control purpose. 

MBZ 

Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP) 
occurs. See reserved. 

memory 

Unless otherwise specified, main memory. 
mofifset 

A 16, 32, or 64-bit offset that specifies a memory operand directly, without using a ModRM or SIB 
byte. 

msb 

Most-significant bit. 

MSB 

Most-significant byte, 
octword 

Same as double quadword. 
offset 

Same as displacement. 
overflow 

The condition in which a floating-point number is larger in magnitude than the largest, finite, 
positive or negative number that can be represented in the data-type format being used. 

packed 

See vector. 

PAE 

Physical-address extensions. 
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physical memory 

Actual memory, consisting of main memory and cache, 
probe 

A check for an address in processor caches or internal buffers. External probes originate outside 
the processor, and internal probes originate within the processor. 

protected mode 

A submode of legacy mode. 

quadword 

Four words, eight bytes, or 64 bits. 

RAZ 

Read as zero. Value returned on a read is always zero (0) regardless of what was previously 
written. See reserved. 

real-address mode, real mode 

A short name for real-address mode, a submode of legacy mode. 
relative 

Referencing with a displacement (offset ) from an instruction pointer rather than the base of a code 
segment. See absolute. 

reserved 

Fields marked as reserved may be used at some future time. 

To preserve compatibility with future processors, reserved fields require special handling when 
read or written by software. Software must not depend on the state of a reserved field (unless 
qualified as RAZ), nor upon the ability of such fields to return a previously written state. 

ff a field is marked reserved without qualification, software must not change the state of that field; 
it must reload that field with the same value returned from a prior read. 

Reserved fields may be qualified as iGN, MBZ, RAZ, or SBZ (see definitions). 

REX 

A legacy instruction modifier prefix that specifies 64-bit operand size and provides access to 
additional registers. 

RIP-relative addressing 

Addressing relative to the 64-bit relative instruction pointer. 

SBZ 

Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior. See 
reserved. 
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scalar 

An atomic value existing independently of any specification of location, direction, etc., as opposed 
to vectors. 


set 

To write the value 1 to a bit or a range of bits. See clear. 

SIMD 

Single instruction, multiple data. See vector. 

Streaming SIMD Extensions (SSE) 

Instructions that operate on scalar or vector (packed) integer and floating point numbers. The SSE 
instruction set comprises the legacy SSE and extended SSE instruction sets. 

SSE1 

Original SSE instruction set. Includes instructions that operate on vector operands in both the 
MMX and the XMM registers. 

SSE2 

Extensions to the SSE instruction set. 

SSE3 

Further extensions to the SSE instruction set. 

SSSE3 

Further extensions to the SSE instruction set. 

SSE4.1 

Further extensions to the SSE instruction set. 

SSE4.2 

Further extensions to the SSE instruction set. 

SSE4A 

A minor extension to the SSE instruction set adding the instructions EXTRQ, INSERTQ, 
MOVNTSS, and MOVNTSD. 

sticky bit 

A bit that is set or cleared by hardware and that remains in that state until explicitly changed by 
software. 

TSS 

Task-state segment. 
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underflow 

The condition in which a floating-point number is smaller in magnitude than the smallest nonzero, 
positive or negative number that can be represented in the data-type format being used. 

vector 

(1) A set of integer or floating-point values, called elements, that are packed into a single operand. 
Most media instructions use vectors as operands. Also called packed or SIMD operands. 

(2) An interrupt descriptor table index, used to access exception handlers. See exception. 

VEX prefix 

Extended instruction encoding escape prefix. Introduces a two- or three-byte encoding escape 
sequence used in the encoding of AVX instructions. Opens a new extended instruction encoding 
space. Fields select the opcode map and allow the specification of operand vector length and an 
additional operand register. S qqXOP prefix. 

virtual-8086 mode 

A submode of legacy mode. 

VMCB 

Virtual machine control block. 

VMM 

Virtual machine monitor, 
word 

Two bytes, or 16 bits. 


x86 

See legacy x86. 

XOP instructions 

Part of the extended SSE instruction set using the XOP prefix. See Streaming SIMD Extensions. 
XOP prefix 

Extended instruction encoding escape prefix. Introduces a three-byte escape sequence used in the 
encoding of XOP instructions. Opens a new extended instruction encoding space distinct from the 
VEX opcode space. Fields select the opcode map and allow the specification of operand vector 
length and an additional operand register. See VEX prefix. 

Registers 

In the following list of registers, mnemonics refer either to the register itself or to the register content: 
AH-DH 

The high 8-bit AH, BH, CH, and DH registers. See [AL-DL], 
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AL-DL 

The low 8-bit AL, BL, CL, and DL registers. See [AH-DH], 

AL-rl5B 

The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and [r8B-rl5B] registers, available in 64-bit 
mode. 


BP 

Base pointer register. 

CR/2 

Control register number n. 


CS 

Code segment register. 
eAX-eSP 

The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers. See [rAX-rSP], 

EFER 

Extended features enable register. 
eFLAGS 

16-bit or 32-bit flags register. See rFLAGS. 

EFLAGS 

32-bit (extended) flags register. 


elP 

16-bit or 32-bit instruction-pointer register. See rIP. 


32-bit (extended) instruction-pointer register. 

FLAGS 

16-bit flags register. 

GDTR 

Global descriptor table register. 

GPRs 

General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP. 
For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit 
data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8-R15. 
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IDTR 

Interrupt descriptor table register. 


IP 

16-bit instruction-pointer register. 

LDTR 

Local descriptor table register. 

MSR 

Model-specific register. 
r8-rl5 

The 8-bit R8B-R15B registers, or the 16-bit R8W-R15W registers, or the 32-bit R8D-R15D 
registers, or the 64-bit R8-R15 registers. 

rAX-rSP 

The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP 
registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64- 
bit size. 

RAX 

64-bit version of the EAX register. 

RBP 

64-bit version of the EBP register. 

RBX 

64-bit version of the EBX register. 

RCX 

64-bit version of the ECX register. 

RDI 

64-bit version of the EDI register. 

RDX 

64-bit version of the EDX register. 
rFLAGS 

16-bit, 32-bit, or 64-bit flags register. See RFLAGS. 

RFLAGS 

64-bit flags register. See rFLAGS. 
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rIP 

16-bit, 32-bit, or 64-bit instruction-pointer register. See RIP. 


RIP 

64-bit instruction-pointer register. 

RSI 

64-bit version of the ESI register. 

RSP 

64-bit version of the ESP register. 

SP 

Stack pointer register. 

SS 

Stack segment register. 

TPR 

Task priority register (CR8). 

TR 

Task register. 

YMM/XMM 

Set of sixteen (eight accessible in legacy and compatibility modes) 256-bit wide registers that hold 
scalar and vector operands used by the SSE instructions. 

Endian Order 

The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte 
values are stored with the least-significant byte at the lowest byte address, and illustrated with their 
least significant byte at the right side. Strings are illustrated in reverse order, because the addresses of 
string bytes increase from right to left. 
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1 Introduction 


Processors capable of performing the same mathematical operation simultaneously on multiple data 
streams are classified as single-instruction, multiple-data (SIMD). Instructions that utilize this 
hardware capability are called SIMD instructions. 

Software can utilize SIMD instructions to drastically increase the performance of media applications 
which typically employ algorithms that perform the same mathematical operation on a set of values in 
parallel. The original SIMD instruction set was called MMX and operated on 64-bit wide vectors of 
integer and floating-point elements. Subsequently a new SIMD instruction set called the Streaming 
SIMD Extensions (SSE) was added to the architecture. 

The SSE instruction set defines a new programming model with its own array of vector data registers 
(YMM/XMM registers) and a control and status register (MXCSR). Most SSE instructions pull their 
operands from one or more YMM/XMM registers and store results in a YMM/XMM register, 
although some instructions use a GPR as either a source or destination. Most instructions allow one 
operand to be loaded from memory. The set includes instructions to load a YMM/XMM register from 
memory (aligned or unaligned) and store the contents of a YMM/XMM register. 

An overview of the SSE instruction set is provided in Volume 1, Chapter 4. 

This volume provides detailed descriptions of each instruction within the SSE instruction set. The SSE 
instruction set comprises the legacy SSE instructions and the extended SSE instructions. 

Legacy SSE instructions comprise the following subsets: 

• The original Streaming SIMD Extensions (herein referred to as SSE 1) 

• SSE2 

• SSE3 

• SSSE3 

• SSE4.1 

• SSE4.2 

• SSE4A 

• Advanced Encryption Standard (AES) 

Extended SSE instructions comprise the following subsets: 

• AVX 

• AVX2 

• FMA 

• FMA4 

• XOP 
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Legacy SSE architecture supports operations involving 128-bit vectors and defines the base 
programming model including the SSE registers, the Media extension Control and Status Register 
(MXCSR), and the instruction exception behavior. 

The Streaming SIMD Extensions (SSE) instruction set is extended to include the AVX, FMA, FMA4, 
and XOP instruction sets. The AVX instruction set provides an extended form for most legacy SSE 
instructions and several new instructions. Extensions include providing for the specification of a 
unique destination register for operations with two or more source operands and support for 256-bit 
wide vectors. Some AVX instructions also provide enhanced functionality compared to their legacy 
counterparts. 

A significant feature of the extended SSE instruction set architecture is the doubling of the width of the 
XMM registers. These registers are referred to as the YMM registers. The XMM registers overlay the 
lower octword (128 bits) of the YMM registers. Registers YMM/XMMO-7 are accessible in legacy 
and compatibility mode. Registers YMM/XMM8-15 are available in 64-bit mode (a subset of long 
mode). VEX/XOP instruction prefixes allow instruction encodings to address the additional registers. 

The SSE instructions can be used in processor legacy mode or long (64-bit) mode. CPUID 
Fn8000_0001_EDX[LM] indicates the availability of long mode. 

Compilation for execution in 64-bit mode offers the following advantages: 

• Access to an additional eight YMM/XMM registers for a total of 16 

• Access to an additional eight 64-bit general-purpose registers for a total of 16 

• Access to the 64-bit virtual address space and the RIP-relative addressing mode 

Hardware support for each of the subsets of SSE instructions listed above is indicated by CPUID 
feature flags. Refer to Volume 3, Appendix D, “Instruction Subsets and CPUID Feature Flags,” for a 
complete list of instruction-related feature flags. The CPUID feature flags that pertain to each 
instruction are also given in the instruction descriptions below. For information on using the CPUID 
instruction, see the instruction description in Volume 3. 

Chapter 2, “Instruction Reference” contains detailed descriptions of each instruction, organized in 
alphabetic order by mnemonic. For those legacy SSE instructions that have an AVX fonn, the 
extended form of the instruction is described together with the legacy instruction in one entry. For 
these instructions, the instruction reference page is located based on the instruction mnemonic of the 
legacy SSE and not the extended (AVX) form. Those AVX instructions without a legacy form are 
listed in order by their AVX mnemonic. The mnemonic for all extended SSE instructions including the 
FMA and XOP instructions begin with the letter V. 

1.1 Syntax and Notation 

The descriptive synopsis of opcode syntax for legacy SSE instructions follows the conventions 
described in Volume 3: General Purpose and System Instructions. See Chapter 2 and the section 
entitled “Notation.” 
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For general information on the programming model and overview descriptions of the SSE instruction 
set, see: 

• “Streaming SIMD Extensions Media and Scientific Programming” in Volume 1. 

• “Instruction Encoding” in Volume 3 

• “Summary of Registers and Data Types” in Volume 3. 

The syntax of the extended instruction sets requires an expanded synopsis. The expanded synopsis 
includes a mnemonic summary and a summary of prefix sequence fields. Figure 1-1 shows the 
descriptive synopsis of a typical XOP instruction. The synopsis of VEX-encoded instructions have the 
same fonnat, differing only in regard to the instruction encoding escape prefix, that is, VEX instead of 
XOP. 


Mnemonic 


VPCMOV ymml, ymm2, ymm3lmem256, ymm4 


Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

8F RXB.08 O.src.1.00 A2/r ib 


assembly language representation 

encoding escape prefix 
3-bit field representing R, X, B bit values 

5-bit map_select field 



W bit 
vvvv field" 


L bit 
pp field 

opcode 

register/memory type specifier 


immediate operand 



Figure 1-1. Typical Descriptive Synopsis - Extended SSE instructions 

1.2 Extended Instruction Encoding 

The legacy SSE instructions are encoded using the legacy encoding syntax and the extended 
instructions are encoded using an enhanced encoding syntax which is compatible with the legacy 
syntax. Both are described in detail in Chapter 1 of Volume 3. 

As described in Volume 3, the extended instruction encoding syntax utilizes multi-byte escape 
sequences to both select alternate opcode maps as well as augment the encoding of the instruction. 
Multi-byte escape sequences are introduced by one of the two VEX prefixes or the XOP prefix. 

The AVX and AVX2 instructions utilize either the two-byte (introduced by the VEX C5h prefix) or the 
three-byte (introduced by the VEX C4h prefix) encoding escape sequence. XOP instructions are 
encoded using a three-byte encoding escape sequence introduced by the XOP prefix (except for the 
XOP instructions VPERMIL2PD and VPERMIL2PS which are encoded using the VEX prefix). The 
XOP prefix is 8Fh. The three-byte encoding escape sequences utilize the map select field of the 
second byte to select the opcode map used to interpret the opcode byte. 
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The two-byte VEX prefix sequence implicitly selects the secondary (“two-byte”) opcode map. 

1.2.1 Immediate Byte Usage Unique to the SSE instructions 

An immediate is a value, typically an operand, explicitly provided within the instruction encoding. 
Depending on the opcode and the operating mode, the size of an immediate operand can be 1,2, 4, or 8 
bytes. Legacy and extended media instructions typically use an immediate byte operand ( imm8). 

A one-byte immediate is generally shown in the instruction synopsis as “ib” suffix. For extended SSE 
instructions with four source operands, the suffix “is4” is used to indicate the presence of the 
immediate byte used to select the fourth source operand. 

The VPERMIL2PD and VPERMIL2PS instructions utilize a fifth 2-bit operand which is encoded 
along with the fourth register select index in an immediate byte. For this special case the immediate 
byte will be shown in the instruction synopsis as “is5”. 

1.2.2 Instruction Format Examples 

The following sections provide examples of two-, three-, and four-operand extended instructions. 
These instructions generally perform nondestructive-source operations, meaning that the result of the 
operation is written to a separately specified destination register rather than overwriting one of the 
source operands. This preserves the contents of the source registers. Most legacy SSE instructions 
perform destructive-source operations, in which a single register is both source and destination, so 
source content is lost. 

1.2.2.1 XMM Register Destinations 

The following general properties apply to YMM/XMM register destination operands. 

• For legacy instructions that use XMM registers as a destination: When a result is written to a 
destination XMM register, bits [255:128] of the corresponding YMM register are not affected. 

• For extended instructions that use XMM registers as a destination: When a result is written to a 
destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

1.2.2.2 Two Operand Instructions 

Two-operand instructions use ModRM-based operand assignment. For most instructions, the first 
operand is the destination, selected by the ModRM.reg field, and the second operand is either a register 
or a memory source, selected by the ModRM.r/m field. 

VCVTDQ2PD is an example of a two-operand AVX instruction. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VCVTDQ2PD xmml, xmm2/mem64 C4 RXB.01 0.1111.0.10 E6/r 

VCVTDQ2PD ymml, xmm2/mem128 C4 RXB.01 0.1111.1.10 E6 /r 
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The destination register is selected by ModRM.reg. The size of the destination register is determined 
by VEX.L. The source is either a YMM/XMM register or a memory location specified by ModRM.r/m 
Because this instruction converts packed doubleword integers to double-precision floating-point 
values, the source data size is smaller than the destination data size. 

VEX.vvvv is not used and must be set to 1111b. 

1.2.2.3 Three-Operand Instructions 

These extended instructions have two source operands and a destination operand. 

VPROTB is an example of a three-operand XOP instruction. 

There are versions of the instruction for variable-count rotation and for fixed-count rotation. 

VPROTB dest, src, variable-count 
VPROTB dest, src, fixed-count 


Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPROTB xmml, xmm2/mem128, xmm3 

8F 

RXB.09 

O.src.O.OO 

90 /r 

VPROTB xmml, xmm2, xmm3/mem128 

8F 

RXB.09 

l.sre.0.00 

90 /r 

VPROTB xmml, xmm2/mem128, imm8 

8F 

RXB.08 

0.1111.0.00 

90 /rib 


For both versions of the instruction, the destination (dest) operand is an XMM register specified by 
ModRM.reg. 

The variable-count version of the instruction rotates each byte of the source as specified by the 
corresponding byte element variable-count. 

Selection of src and variable-count is controlled by XOPW. 

• When XOPW = 0, src is either an XMM register or a 128-bit memory location specified by 
ModRM.r/m, and variable-count is an XMM register specified by XOP.vvvv. 

• When XOPW = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an 
XMM register or a 128-bit memory location specified by ModRM.r/m. 

Table 1-1 summarizes the effect of the XOP.W bit on operand selection. 


Table 1-1. Three-Operand Selection 


XOP.W 

dest 

src 

variable-count 

0 

ModRM.reg 

ModRM.r/m 

XOP.vvvv 

1 

ModRM.reg 

XOP.vvvv 

ModRM.r/m 


The fixed-count version of the instruction rotates each byte of src as specified by the immediate byte 
operand fixed-count. For this version, src is either an XMM register or a 128-bit memory location 
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specified by ModRM.r/m. Because XOP.vvvv is not used to specify the source register, it must be set 
to 111 lb or execution of the instruction will cause an Invalid Opcode (#UD) exception. 

1.2.2.4 Four-Operand Instructions 

Some extended instructions have three source operands and a destination operand. This is 
accomplished by using the VEX/XOP.vvvv field, the ModRM.reg and ModRM.r/m fields, and bits 
[7:4] of an immediate byte to select the operands. The opcode suffix “is4” is used to identify the 
immediate byte, and the selected operands are shown in the synopsis. 

VFMSUBPD is an example of an four-operand FMA4 instruction. 

VFMSUBPD dest, srcl, src2, src3 dest = srcl * src2 - src3 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.sre.0.01 

6D 

/ r 

is4 

VFMSUBPD ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.03 

O.src.1.01 

6D 

/r 

is4 

VFMSUBPD xmml, xmm2, xmm3, xmm4/mem128 

C4 

RXB.03 

l.src.0.01 

6D 

/r 

is4 

VFMSUBPD ymml, ymm2, ymm3, ymm4/mem256 

C4 

RXB.03 

l.src.1.01 

6D 

/r 

is4 


The first operand, the destination (dest), is an XMM register or a YMM register (as determined by 
VEX.L) selected by ModRM.reg. The following three operands (srcl, src2, src3 ) are sources. 

The srcl operand is an XMM or YMM register specified by VEX.vvvv. 

VEX.W detennines the configuration of the src2 and src3 operands. 

• When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m, and 
src3 is a register specified by bits [7:4] of the immediate byte. 

• When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either 
a register or a memory location specified by ModRM.r/m. 

Table 1-1 summarizes the effect of the VEX.W bit on operand selection. 


Table 1-2. Four-Operand Selection 


VEX.W 

dest 

srcl 

src2 

src3 

0 

ModRM.reg 

VEX.vvvv 

ModRM.r/m 

is4[7:4] 

1 

ModRM.reg 

VEX.vvvv 

is4[7:4] 

ModRM.r/m 


1.3 VSIB Addressing 

Specific AVX2 instructions utilize a vectorized fonn of indexed register-indirect addressing called 
vector SIB (VSIB) addressing. In contrast to the standard indexed register-indirect address mode, 
which generates a single effective address to access a single memory operand, VSIB addressing gen¬ 
erates an array of effective addresses which is used to access data from multiple memory locations in 
a single operation. 
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VSIB addressing is encoded using three or six bytes following the opcode byte, augmented by the X 
and B bits from the VEX prefix. The first byte is the ModRM byte with the standard mod, reg, and 
r/m fields (although allowed values for the mod and r/m fields are restricted). The second is the VSIB 
byte which replaces the SIB byte in the encoding. The VSIB byte specifies a GPR which serves as a 
base address register and an XMM/YMM register that contains a packed array of index values. The 
two-bit scale field specifies a common scaling factor to be applied to all of the index values. A con¬ 
stant displacement value is encoded in the one or four bytes that follow the VSIB byte. 

Figure 1-2 shows the format of the VSIB byte. 


SS 


index 


base 


VSIB 


VEX.X extends this field to 4 bits 




VEX.B extends this field to 4 bits- 

v4_VSIB_format.eps 

Figure 1-2. VSIB Byte Format 


VSIB.SS (Bits [7:6]). The SS field is used to specify the scale factor to be used in the computation 
of each of the effective addresses. The scale factor scale is equal to 2 SS (two raised to power of the 
value of the SS field). Therefore, if SS = 00b, scale = 1; if SS = 01b, scale = 2; if SS = 10b, scale = 4; 
and if SS = 1 lb, scale = 8. 

VSIB.index (Bits [5:3]). This field is concatenated with the complement of the VEX.X bit ({X, 
index}) to specify the YMM/XMM register that contains the packed array of index values index[i\ to 
be used in the computation of the array of effective addresses effective address[i\. 

VSIB.base (Bits [5:3]). This field is concatenated with the complement of the VEX.B bit ({B, 
base}) to specify the general-purpose register (base GPR) that contains the base address base to be 
used in the computation of each of the effective addresses. 

1.3.1 Effective Address Array Computation 

Each element i of the effective address array is computed using the fonnula: 

effective address[i] = scale * index[i] + base + displacement. 

where index[i\ is the z'th element of the XMM/YMM register specified by {X, VS IB. index}. An index 
element is either 32 or 64 bits wide and is treated as a signed integer. 

Variants of this mode use either an eight-bit or a 32-bit displacement value. One variant sets the base 
to zero. The value of the ModRM.mod field specifies the specific variant of VSIB addressing mode, 
as shown in Table 1. In the table, the notation [XMM/z/'YMM/;] indicates the XMM/YMM register 
that contains the packed index array and [base GPR] means the contents of the base GPR selected by 
{B, base}. 
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Table 1: Vectorized Addressing Modes 


Index 1 

ModRM.mod 

00 

01 

10 

0000 

scale * [XMM0/YMM0] + Disp32 

scale * [XMM0/YMM0] + Disp8 + 
[base GPR] 

scale * [XMM0/YMM0] + Disp32 + 

[base GPR] 

0001 

scale * [XMM1/YMM1] + Disp32 

scale * [XMM1/YMM1] + Disp8 + 
[base GPR] 

scale * [XMM1/YMM1] + Disp32 + 

[base GPR] 

0010 

scale * [XMM2/YMM2] + Disp32 

scale * [XMM2/YMM2] + Disp8 + 
[base GPR] 

scale * [XMM2/YMM2] + Disp32 + 

[base GPR] 

0011 

scale * [XMM3/YMM3] + Disp32 

scale * [XMM3/YMM3] + Disp8 + 
[base GPR] 

scale * [XMM3/YMM3] + Disp32 + 

[base GPR] 

0100 

scale * [XMM4/YMM4] + Disp32 

scale * [XMM4/YMM4] + Disp8 + 
[base GPR] 

scale * [XMM4/YMM4] + Disp32 + 

[base GPR] 

0101 

scale * [XMM5/YMM5] + Disp32 

scale * [XMM5/YMM5] + Disp8 + 
[base GPR] 

scale * [XMM5/YMM5] + Disp32 + 

[base GPR] 

0110 

scale * [XMM6/YMM6] + Disp32 

scale * [XMM6/YMM6] + Disp8 + 
[base GPR] 

scale * [XMM6/YMM6] + Disp32 + 

[base GPR] 

0111 

scale * [XMM7/YMM7] + Disp32 

scale * [XMM7/YMM7] + Disp8 + 
[base GPR] 

scale * [XMM7/YMM7] + Disp32 + 

[base GPR] 

1000 

scale * [XMM8/YMM8] + Disp32 

scale * [XMM8/YMM8] + Disp8 + 
[base GPR] 

scale * [XMM8/YMM8] + Disp32 + 

[base GPR] 

1001 

scale * [XMM9/YMM9] + Disp32 

scale * [XMM9/YMM9] + Disp8 + 
[base GPR] 

scale * [XMM9/YMM9] + Disp32 + 

[base GPR] 

1010 

scale * [XMM10/YMM10] + Disp32 

scale * [XMM10/YMM10] + Disp8 + 
[base GPR] 

scale * [XMM10/YMM10] + Disp32 + 
[base GPR] 

1011 

scale * [XMM11/YMM11] + Disp32 

scale * [XMM11/YMM11] + Disp8 + 
[base GPR] 

scale * [XMM11/YMM11] + Disp32 + 
[base GPR] 

1100 

scale * [XMM12/YMM12] + Disp32 

scale * [XMM12/YMM12] + Disp8 + 
[base GPR] 

scale * [XMM12/YMM12] + Disp32 + 
[base GPR] 

1101 

scale * [XMM13/YMM13] + Disp32 

scale * [XMM13/YMM13] + Disp8 + 
[base GPR] 

scale * [XMM13/YMM13] + Disp32 + 
[base GPR] 

1110 

scale * [XMM14/YMM14] + Disp32 

scale * [XMM14/YMM14] + Disp8 + 
[base GPR] 

scale * [XMM14A'MM14] + Disp32 + 
[base GPR] 

1111 

scale * [XMM15/YMM15] + Disp32 

scale * [XMM15/YMM15] + Disp8 + 
[base GPR] 

scale * [XMM15/YMM15] + Disp32 + 
[base GPR] 

Note 1 . Index = {VEX.X,VSIB.index}. In 32-bit mode, VEX.X = 1 . 


1.3.2 Notational Conventions Related to VSIB Addressing Mode 

In the instruction descriptions that follow, the notation vm32x indicates a packed array of four 32-bit 
index values contained in the specified XMM index register and vm32y indicates a packed array of 
eight 32-bit index values contained in the specified YMM index register. Depending on the instruc¬ 
tion, these indices can be used to compute the effective address of up to four (vm32x) or eight 
(vm32y) memory-based operands. 

The notation vm64x indicates a packed array of two 64-bit index values contained in the specified 
XMM index register and vm64y indicates a packed array of four 64-bit index values contained in the 
specified YMM index register. Depending on the instruction, these indices can be used to compute 
the effective address of up to two (vm64x) or four (vm64y) memory-based operands. 
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In body of the description of the instructions, the notation mem32[vm32x] is used to represent a 
sparse array of 32-bit memory operands where the packed array of four 32-bit indices used to calcu¬ 
late the effective addresses of the operands is held in an XMM register. The notation mem32[vm32y] 
refers to a similar array of 32-bit memory operands where the packed array of eight 32-bit indices is 
held in a YMM register. The notation mem32[vm64x] means a sparse array of 32-bit memory oper¬ 
ands where the packed array of two 64-bit indices is held in an XMM register and mem32[vm64y] 
means a sparse array of 32-bit memory operands where the packed array of four 64-bit indices is held 
in a YMM register. 

The notation mem64[/'nc/ex_a/ray], where index_array is either vm32x, vm64x, or vm64y, speci¬ 
fies a sparse array of 64-bit memory operands addressed via a packed array of 32-bit or 64-bit indices 
held in an XMM/YMM register. If an instruction uses either an XMM or a YMM register, depending 
on operand size, to hold the index array, the notation vm32x/y or vm64x/y is used to represent the 
array. 

In summary, given a maximum operand size of 256-bits, a sparse array of 32-bit memory-based oper¬ 
ands can be addressed using a vm32x, vm32y, vm64x, or vm64y index array. A sparse array of 64- 
bit memory-based operands can be addressed using a vm32x, vm64x, or vm64y index array. Spe¬ 
cific instructions may use fewer than the maximum number of memory operands that can be 
addressed using the specified index array. 

VSIB addressing is only valid in 32-bit or 64-bit effective addressing mode and is only supported for 
instruction encodings using the VEX prefix. The ModRM.mod value of lib is not valid in VSIB 
addressing mode and ModRM.r/m must be set to 100b. 

1.3.3 Memory Ordering and Exception Behavior 

VSIB addressing has some special considerations relative to memory ordering and the signaling of 
exceptions. 

VSIB addressing specifies an array of addresses that allows an instruction to access multiple memory 
locations. The order in which data is read from or written to memory is not specified. Memory order¬ 
ing with respect to other instructions follows the memory-ordering model described in Volume 2. 

Data may be accessed by the instruction in any order, but access-triggered exceptions are delivered in 
right-to-left order. That is, if a exception is triggered by the load or store of an element of an 
XMM/YMM register and delivered, all elements to the right of that element (all the lower indexed 
elements) have been or will be completed without causing an exception. Elements to the left of the 
element causing the exception may or may not be completed. If the load or store of a given element 
triggers multiple exceptions, they are delivered in the conventional order. 

Because data can be accessed in any order, elements to the left of the one that triggered the exception 
may be read or written before the exception is delivered. Although the ordering of accesses is not 
specified, it is repeatable in a specific processor implementation. Given the same input values and ini¬ 
tial architectural state, the same set of elements to the left of the faulting one will be accessed. 

VSIB addressing should not be used to access memory mapped I/O as the ordering of the individual 
loads is implementation-specific and some implementations may access data larger than the data ele¬ 
ment size or access elements more than once. 
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1.4 Enabling SSE Instruction Execution 

Application software that utilizes the SSE instructions requires support from operating system 
software. 

To enable and support SSE instruction execution, operating system software must: 

• enable hardware for supported SSE subsets 

• manage the SSE hardware architectural state, saving and restoring it as required during and after 
task switches 

• provide exception handlers for all unmasked SSE exceptions. 

See Volume 2, Chapter 11, for details on enabling SSE execution and managing its execution state. 

1.5 String Compare Instructions 

The legacy SSE instructions PCMPESTRI, PCMPESTRM, PCMPISTRI, and PCMPISTRM and the 
extended SSE instructions VPCMPESTRI, VPCMPESTRM, VPCMPISTRI, and VPCMPISTRM 
provide a versatile means of classifying characters of a string by performing one of several different 
types of comparison operations using a second string as a prototype. 

This section describes the operation of the legacy string compare instructions. This discussion applies 
equally to the extended versions of the instructions. Any difference between the legacy and the 
extended version of a given instruction is described in the instruction reference entry for the 
instruction in the following chapter. 

A character string is a vector of data elements that is normally used to represent an ordered 
arrangement of graphemes which may be stored, processed, displayed, or printed. Ordered strings of 
graphemes are most often used to convey information in a human-readable manner. The string 
compare instructions, however, do not restrict the use or interpretation of their operands. 

The first source operand provides the prototype string and the second operand is the string to be 
scanned and characterized (referred to herein as the string under test, or SUT). Four string formats and 
four types of comparisons are supported. The intermediate result of this processing is a bit vector that 
summarizes the characterization of each character in the SUT. This bit vector is then post-processed 
based on options specified in the instruction encoding. Instruction variants determine the final result— 
either an index or a mask. 

Instruction execution affects the arithmetic status flags (ZF, CF, SF, OF, AF, PF), but the significance 
of many of the flags is redefined to provide information tailored to the result of the comparison 
performed. See Section 1.5.6, “Affect on Flags” on page 19. 

The instructions have a defined base function and additional functionality controlled by bit fields in an 
immediate byte operand ( imm8 ). The base function determines whether the source strings have 
implicitly (PCMPISTRI and PCMPISTRM) or explicitly (PCMPESTRI and PCMPESTRM) defined 
lengths, and whether the result is an index (PCMPISTRI and PCMPESTRI) or a mask (PCMPISTRM 
and PCMPESTRM). 
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PCMPISTRI and PCMPESTRI return their final result (an integer value) via the ECX register, while 
PCMPISTRM and PCMPESTRM write a bit or character mask, depending on the option selected, to 
the XMMO register. 

There are a number of different schemes for encoding a set of graphemes, but the most common ones 
use either an 8-bit code (ASCII) or a 16-bit code (unicode). The string compare instructions support 
both character sizes. 
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Bit fields of the immediate operand control the following functions: 

• Source data format — character size (byte or word), signed or unsigned values 

• Comparison type 

• Intennediate result postprocessing 

• Output option selection 

This overview description covers functions common to all of the string compare instructions and 
describes some of the differentiated features of specific instructions. Information on instruction 
encoding and exception behavior are covered in the individual instruction reference pages in the 
following chapter. 
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1.5.1 Source Data Format 


The character strings that constitute the source operands for the string compare instructions are 
formatted as either 8-bit or 16-bit integer values packed into a 128-bit data type. The figure below 
illustrates how a string of byte-wide characters is laid out in memory and how these characters are 
arranged when loaded into an XMM register. 


112h Highest address 
111 h 
11 Oh 
lOFh 
lOEh 
lODh 
lOCh 
lOBh 
lOAh 
109h 
108h 
107h 
106h 
105h 
104h 

103h Lowest address 

Defines address of string 

XMM Register Image 


63 7 

6 

5 

4 

3 

2 

1 

0 o 

[blank] (20h) 

t (74h) 

r (72h) 

o (6Fh) 

h (68h) 

s (73h) 

[blank] (20h) 

A (41 h) 

127 15 

14 

13 

12 

11 

10 

9 

8 64 

[null] (00) 

■ (2Eh) 

g (67h) 

n (6Eh) 

i (69h) 

r (72h) 

t (74h) 

s (73h) 


v4_String_layout.eps 

Figure 1-3. Byte-wide Character String - Memory and Register Image 


Note from the figure that the longest string that can be packed in a 128-bit data object is either sixteen 
8-bit characters (as illustrated) or eight 16-bit characters. When loaded from memory, the character 
read from the lowest address in memory is placed in the least-significant position of the register and 
the character read from the highest address is placed in the most-significant position. In other words, 
for character i of width vv, bits [w~l :0] of the character are placed in bits \iw + (w~\)'.iw\ of the 
register. 


Memory Image 


128-bit String of 
Byte-wide 
Characters in 
Memory (ASCII 
Encoded) 


[null] (00) 

■ (2Eh) 

g (67h) 

n (6Eh) 

i (69h) 

r (72h) 

t (74h) 

s (73h) 

[blank] (20h) 

t (74h) 

r (72h) 

o (6Fh) 

h (68h) 

s (73h) 

[blank] (20h) 

A (41 h) 
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Bits [1:0] of the immediate byte operand specify the source string data fonnat, as shown in Table 1-3. 


Table 1-3. Source Data Format 


Imm8[1:0] 

Character Format 

Maximum String Length 

00b 

unsigned bytes 

16 

01b 

unsigned words 

8 

10b 

signed bytes 

16 

11b 

signed words 

8 


The string compare instructions are defined with the capability of operating on strings of lengths from 
0 to the maximum that can be packed into the 128-bit data type as shown in the table above. Because 
strings being processed may be shorter than the maximum string length, a means is provided to 
designate the length of each string. As mentioned above, one pair of string compare instructions relies 
on an explicit method while the other utilizes an implicit method. 

For the explicit method, the length of the first operand (the prototype string) is specified by the 
absolute value of the signed integer contained in rAX and the length of the second operand (the SUT) 
is specified by the absolute value of the signed integer contained in rDX. If a specified length is greater 
than the maximum allowed, the maximum value is used. Using the explicit method of length 
specification, null characters (characters whose numerical value is 0) can be included within a string. 

Using the implicit method, a string shorter than the maximum length is terminated by a null character. 
If no null character is found in the string, its length is implied to be the maximum. For the example 
illustrated in Figure 1-3 above, the implicit length of the string is 15 because the final character is null. 
However, using the explicit method, a specified length of 16 would include the null character in the 
string. 

In the following discussion, // is the length of the first operand string (the prototype string), /? is the 
length of the second operand string (the SUT) and m is the maximum string length based on the 
selected character size. 

1.5.2 Comparison Type 

Although the string compare instructions can be implemented in many different ways, the instructions 
are most easily understood as the sequential processing of the SUT using the characters of the 
prototype string as a template. The template is applied at each character index of SUT, processing the 
string from the first character (index 0) to the last character (index l 2 ~ 1). 

The result of each comparison is recorded in successive positions of a summary bit vector CmprSumm. 
When the sequence of comparisons is complete, this bit vector summarizes the results of comparison 
operations that were performed. The length of the CmprSumm bit vector is equal to the maximum 
input operand string length (m). The rules for the setting of CmprSumm bits beyond the end of the SUT 
(CmprSumm[m~ 1:/?]) are dependent on the comparison type (see Table 1-4 below.) 

Bits [3:2] of the immediate byte operand determine the comparison type, as shown in Table 1-4. 
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Table 1-4. Comparison Type 


Imm8[3:2] 

Comparison 

Type 

Description 

00b 

Subset 

Tests each character of the SUT to determine if it is within the subset of 
characters specified by the prototype string. Each set bit of CmprSumm 
indicates that the corresponding character of the SUT is within the subset 
specified by the prototype. Bits [m—1 :/ 2 ] are cleared. 

01b 

Ranges 

Tests each character of the SUT to determine if it lies within one or more 
ranges specified by pairs of values within the prototype string. The ranges 
are inclusive. Each set bit in CmprSumm indicates that the corresponding 
character of the SUT is within one or more of the inclusive ranges specified. 
Bits [m-1:/ 2 ] are cleared. If the length of the prototype is odd, the last value 
in the prototype is effectively ignored. 

10b 

Match 

Performs a character-by-character comparison between the SUT and the 
prototype string. Each set bit of CmprSumm indicates that the 
corresponding characters in the two strings match. If not, the bit is cleared. 
Bits [m-1 :max(/ 7 , l 2 )] of CmprSumm are set. 

11b 

Sub-string 

Searches for an exact match between the prototype string and an ordered 
sequence of characters (a sub-string) in the SUT beginning at the current 
index /. Bit / of CmprSumm is set for each value of / where the sub-string 
match is made, otherwise the bit is cleared. See discussion below. 


In the Sub-string comparison type, any matching sub-string of the SUT must match the prototype 
string one-for-one, in order, and without gaps. Null characters in the SUT do not match non-null 
characters in the prototype. If the prototype and the SUT are equal in length and less than the max 
length, the two strings must be identical for the comparison to be TRUE. In this case, bit 0 of 
CmprSumm is set to one and the remainder are all Os. If the length of the SUT is less than the prototype 
string, no match is possible and CmprSumm is all Os. 

If the prototype string is shorter than the SUT (/; < /?), a sequential search of the SUT is perfonned. 
For each i from 0 to l 2 ~l 2 , the prototype is compared to characters [/ + Ij~ 1:/] of the SUT. If the 
prototype and the sub-string SUT[/ + If 1 :/] match exactly, then CmprSumm[i\ is set, otherwise the bit 
is cleared. When the comparison at i = l 2 ~l 7 is complete, no further testing is required because there 
are not enough characters remaining in the SUT for a match to be possible. The remaining bits l 2 ~l y+l 
through m-1 are all set to 0 . 

For the Match comparison type, the character-by-character comparison is perfonned on all m 
characters in the 128-bit operand data, which may extend beyond the end of one or both strings. A null 
character at index i within one string is not considered a match when compared with a character 
beyond the end of the other string. In this case, CmprSumm[i\ is cleared. For index positions beyond 
the end of both strings, CmprSumm[i\ is set. 

The following section provides more detail on the generation of the comparison summary bit vector 
based on the specified comparison type. 
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1.5.3 Comparison Summary Bit Vector 

The following pseudo code provides more detail on the generation of the comparison summary bit 
vector CmprSumm. The function CompareSt rgs defined below returns a bit vector of length m, the 
maximum length of the operand data strings. 

bit vector CompareStrgs(ProtoType, lengthl, SUT, length2, CmpType, signed, m) 
doubleword vector StrUndTst // temp vector; holds string under test 

doubleword vector StrProto // temp vector; holds prototype string 

bit vector[m] Result // length of vector is m 

StrProto = m{0} //initialize m elements of StrProto to 0 

StrUndTst = m{0} //initialize m elements of StrUndTst to 0 

Result = m{0} //initialize result bit vector 

FOR i = 0 to lengthl 

StrProto[i] = signed ? SignExtend(ProtoType[i]) : ZeroExtend(ProtoType[i]) 

FOR i = 0 to length2 

StrUndTst[i] = signed ? SignExtend(SUT[i]) : ZeroExtend(SUT[i]) 

IF CmpType == Subset 

FOR j = 0 to length2 - 1 
FOR i = 0 to lengthl - 1 

Result[j] |= (StrProto[i] 

IF CmpType == Ranges 

FOR j = 0 to length2 - 1 // j indexes SUT 

FOR i = 0 to lengthl - 2, BY 2 // i indexes prototype 

Result[j] |= (StrProto[i] <= StrUndTst[j]) 

&& (StrProto [i + 1] >= StrUndTst[j]) 

IF CmpType == Match 

FOR i = 0 to (min(lengthl, length2)-l) 

Result[i] = (StrProto[i] == StrUndTst[i]) 

FOR i = min(lengthl, length2) to (max(lengthl, length2)-l) 

Result[i] =0 

FOR i = max(lengthl, length2) to (m-1) 

Result[i] =1 

IF CmpType == Sub-string 

IF (length2==l6)&& (lengthl==l6) 

maxlength=15 
else 

maxlength = length2-lengthl 
IF length2 >= lenghtl 

FOR j = 0 to maxlength // j indexes result bit vector 

Result [ j] =1 

k = j // k scans the SUT 

FOR i = 0 to lengthl - 1 // i scans the Prototype 

Result[j] &= (StrProto[i] == StrUndTst[k])// Result[j] is cleared if 
any of the comparisons do not match 
k++ 

Return Result 


// j indexes SUT 
// i indexes prototype 
== StrUndTst[j]) 
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Given the above definition of CompareSt rgs (), the following pseudo code computes the value of 
CmprSumm: 

ProtoType = contents of first source operand (xmml) 

SUT = contents of xmm2 or 128-bit value read from the specified memory location 
lengthl = length of first operand string //specified implicitly or explicitly 

length2 = length of second operand string //specified implicitly or explicitly 

m = Maximum String Length from Table 1-3 above 
CmpType = Comparison Type from Table 1-4 above 
signed = (imm8[l] == 1) ? TRUE : FALSE 

bit vector [m] CmprSumm // CmprSumm is m bits long 

CmprSumm = CompareStrgs(ProtoType, lengthl, SUT, length2, CmpType, signed, m) 


The following examples demonstrate the comparison summary bit vector CmprSumm for each 
comparison type. For the sake of illustration, the operand strings are represented as ASCII-encoded 
strings. Each character value is represented by its ASCII grapheme. Strings are displayed with the 
lowest indexed character on the left as they would appear when printed or displayed. CmprSumm is 
shown in reverse order with the least significant bit on the left to agree with the string presentation. 

Comparison Type = Subset 

Prototype: ZCx 

SUT: aCx%xbZreCx 

CmprSumm: 0110101001100000 

Comparison Type = Ranges 

Prototype: ACax 

SUT: aCx%xbZreCx 

CmprSumm: 1110110111100000 

Comparison Type = Match 

Prototype: ZCx 

SUT: aCx%xbZreCx 

CmprSumm: 0110000000011111 

Comparison Type = Sub-string 


Prototype: 
SUT: 

CmprSumm: 


ZCx 

aZCx%xCZreZCxCZ 

0100000000100000 
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1.5.4 Intermediate Result Post-processing 

Post-processing of the CmprSumm bit vector is controlled by imm8\ 5:4], The result of this step is 
designated pCmprSumm. 

Bit [4] of the immediate operand determines whether a ones’ complement (bit-wise inversion) is 
performed on CmprSumm ; bit [5] of the immediate operand determines whether the inversion applies 
to the entire comparison summary bit vector ( CmprSumm ) or just to those bits that correspond to 
characters within the SUT. See Table 1-5 below for the encoding of the imm8\ 5:4] field. 


Table 1-5. Post-processing Options 


Imm8[5:4] 

Post-processing Applied 

xOb 

pCmprSumm = CmprSumm 

01b 

pCmprSumm = NOT CmprSumm 

11b 

pCmprSumm[/] = !CmprSumm[/] for / < l 2 , 
pCmprSumm[/] = CmprSumm[/], for / 2 ^ / < m 


1.5.5 Output Option Selection 

For PCMPESTRI and PCMPISTRI, imm8\ 6 ] detennines whether the index of the lowest set bit or the 
highest set bit of pCmprSumm is written to ECX, as shown in Table 1-6. 


Table 1-6. Indexed Output Option Selection 


Imm8[6] 

Description 

Ob 

Return the index of the least significant set bit in pCmprSumm. 

1b 

Return the index of the most significant set bit in pCmprSumm. 


For PCMPESTRM and PCMPISTRM, imm8[ 6 ] specifies whether the output from the instruction is a 
bit mask or an expanded mask. The bit mask is a copy of pCmprSumm zero-extended to 128 bits. The 
expanded mask is a packed vector of byte or word elements, as determined by the string operand 
format (as indicated by imm8[ 0]). The expanded mask is generated by copying each bit of 
pCmprSumm to all bits of the element of the same index. Table 1-7 below shows the encoding of 
imm8\6 ]. 


Table 1-7. Masked Output Option Selection 


Imm8[6] 

Description 

Ob 

Return pCmprSumm as the output with zero extension to 128 bits. 

1b 

Return expanded pCmprSumm byte or word mask. 


The PCMPESTRM and PCMPISTRM instructions return their output in register XMMO. For the 
extended forms of the instructions, bits [127:64] of YMMO are cleared. 
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1.5.6 Affect on Flags 

The execution of a string compare instruction updates the state of the CF, PF, AF, ZF, SF, and OF flags 
within the rFLAGs register. All other flags are unaffected. The PF and AF flags are always cleared. 
The ZF and SF flags are set or cleared based on attributes of the source strings and the CF and OF flags 
are set or cleared based on attributes of the summary bit vector after post processing. 

The CF flag is cleared if the summary bit vector, after post processing, is zero; the flag is set if one or 
more of the bits in the post-processed bit vector are 1. The OF flag is updated to match the value of the 
least significant bit of the post-processed summary bit vector. 

The ZF flag is set if the length of the second string operand (SUT) is shorter than in, the maximum 
number of 8-bit or 16-bit characters that can be packed into 128 bits. Similarly, the SF flag is set if the 
length of the first string operand (prototype) is shorter than m. 

This infonnation is summarized in Table 1-8 below. 


Table 1 -8. State of Affected Flags After Execution 


Unconditional 

Source String Length 

Post-processed Bit Vector 

PF 

AF 

SF 

ZF 

CF 

OF 

0 

0 

Cl < m) 

(/ 2 < m) 

pCmprSumm 4 0 

pCmprSumm [0] 
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2 Instruction Reference 


Instructions are listed by mnemonic, in alphabetic order. Each entry describes instruction function, 
syntax, opcodes, affected flags and exceptions related to the instruction. 

Figure 2-1 shows the conventions used in the descriptions. Items that do not pertain to a particular 
instruction, such as a synopsis of the 256-bit form, may be omitted. 


INST Instruction 

VINST Mnemonic Expansion 

Brief functional description 

INST 

Description of legacy version of instruction. 

VINST 

Description of extended version of instruction. 

XMM Encoding 

Description of 128-bit extended instruction. 

YMM Encoding 

Description of 256-bit extended instruction. 

Information about CPUID functions related to the instruction set. 

Synopsis diagrams for legacy and extended versions of the instruction. 


Mnemonic Opcode Description 

INST xmml, xmm2/mem128 FF FF /r Brief summary of legacy operation. 

Opcode 
FF/r 
FF/r 

Instructions that perform similar or related functions. 

rFLAGS Affected 

Rflags diagram. 

MXCSR Flags Affected 

MXCSR diagram. 

Exceptions 

Exception summary table. 


Mnemonic 



Encoding 


VEX 

RXB.mmmmm W.vvvv.L.pp 

VINST xmml , xmm2/mem128, xmm3 

C4 

RXB.11 

O.src.O.OO 

VINST ymml, ymm2/mem256, ymm3 

Related Instructions 

C4 

■rxb.ii 

O.src.O.OO 


Figure 2-1. Typical Instruction Description 


Instruction Reference 
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Instruction Exceptions 

Under various conditions instructions described below can cause exceptions. The conditions that 
cause these exceptions can differ based on processor mode and instruction subset. This infonnation is 
summarized at the end of each instruction reference page in an Exception Table. Rows list the appli¬ 
cable exceptions and the different conditions that trigger each exception for the instruction. For each 
processor mode (real, virtual, and protected) a symbol in the table indicates whether this exception 
condition applies. 

Each AVX instruction has a legacy form that comes from one of the legacy (SSE1, SSE2,...) subsets. 
An “X” at the intersection of a processor mode column and an exception cause row indicates that the 
causing condition and potential exception applies to both the AVX instruction and the legacy SSE 
instruction. “A” indicates that the causing condition applies only to the AVX instruction and “S” indi¬ 
cates that the condition applies to the SSE legacy instruction. 

Note that XOP and FMA4 instructions do not have corresponding instructions from the SSE legacy 
subsets. In the exception tables for these instructions, “X” represents the XOP instruction and “F” 
represents the FMA4 instruction. 


22 


Instruction Reference 



26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


ADDPD Add 

VADDPD Packed Double-Precision Floating-Point 

Adds each packed double-precision floating-point value of the first source operand to the correspond¬ 
ing value of the second source operand and writes the result of each addition into the corresponding 
quadword of the destination. 

There are legacy and extended fonns of the instruction: 

ADDPD 

Adds two pairs of values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VADDPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Adds two pairs of values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

Adds four pairs of values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

instruction Support 


Form 

Subset 

Feature Flag 

ADDPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VADDPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

ADDPD xmml, xmm2/mem128 66 OF 58/r 

Mnemonic 

VADDPD xmml, xmm2, xmm3/mem128 
VADDPD ymml, ymm2, ymm3/mem256 


Description 

Adds two packed double-precision floating-point 
values in xmml to corresponding values in xmm2 
or mem128. Writes results to xmml. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.01 

58 /r 

C4 

RXB.00001 

X.src.1.01 

58 /r 


Instruction Reference 


ADDPD, VADDPD 


23 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Related Instructions 

(V)ADDPS, (V)ADDSD, (V)ADDSS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ADDPS Add 

VADDPS Packed Single-Precision Floating-Point 

Adds each packed single-precision floating-point value of the first source operand to the correspond¬ 
ing value of the second source operand and writes the result of each addition into the corresponding 
elements of the destination. 

There are legacy and extended fonns of the instruction: 

ADDPS 

Adds four pairs of values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VADDPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Adds four pairs of values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

Adds eight pairs of values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

instruction Support 


Form 

Subset 

Feature Flag 

ADDPS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VADDPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ADDPS xmml, xmm2/mem128 OF 58 /r Adds four packed single-precision floating-point values in 

xmml to corresponding values in xmm2 or mem128. Writes 
results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VADDPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

58 It 

VADDPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

58 /r 


Instruction Reference 
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Related Instructions 

(V)ADDPD, (V)ADDSD, (V)ADDSS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AMD64 Technology 


ADDSD Add 

VADDSD Scalar Double-Precision Floating-Point 

Adds the double-precision floating-point value in the low-order quadword of the first source operand 
to the corresponding value in the low-order quadword of the second source operand and writes the 
result into the low-order quadword of the destination. 

There are legacy and extended fonns of the instruction: 

ADDSD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] 
of the destination and bits [255:128] of the corresponding YMM register are not affected. 

VADDSD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first 
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

ADDSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VADDSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ADDSD xmml, xmm2/mem64 F2 OF 58 /r Adds low-order double-precision floating-point values in 

xmml to corresponding values in xmm2 or mem64. 
Writes results to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VADDSD xmml, xmm2, xmm3/mem64 C4 RXB.00001 X.src.X.11 58/r 

Related Instructions 

(V)ADDPD, (V)ADDPS, (V)ADDSS 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ADDSS Add 

VADDSS Scalar Single-Precision Floating-Point 

Adds the single-precision floating-point value in the low-order doubleword of the first source oper¬ 
and to the corresponding value in the low-order doubleword of the second source operand and writes 
the result into the low-order doubleword of the destination. 

There are legacy and extended forms of the instruction: 

ADDSS 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the 
destination register and bits [255:128] of the corresponding YMM register are not affected. 

VADDSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first 
source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM regis¬ 
ter that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

ADDSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VADDSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

ADDSS xmml, xmm2/mem32 F3 OF 58 /r 


Mnemonic 

VADDSS xmml, xmm2, xmm3lmem32 


Description 

Adds a single-precision floating-point value in the low-order 
doubleword of xmml to a corresponding value in xmm2 or 
mem32. Writes results to xmml. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.X.10 58/r 


Related Instructions 

(V)ADDPD, (V)ADDPS, (V)ADDSD 


rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AMD64 Technology 


ADDSUBPD Alternating Addition and Subtraction 

VADDSUBPD Packed Double-Precision Floating-Point 

Adds the odd-numbered packed double-precision floating-point values of the first source operand to 
the corresponding values of the second source operand and writes the sum to the corresponding odd- 
numbered element of the destination; subtracts the even-numbered packed double-precision floating¬ 
point values of the second source operand from the corresponding values of the first source operand 
and writes the differences to the corresponding even-numbered element of the destination. 

There are legacy and extended fonns of the instruction: 

ADDSUBPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VADDSUBPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ADDSUBPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VADDSUBPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

ADDSUBPD xmml, xmm2/mem128 66 OF DO /r 

Mnemonic 

VADDSUBPD xmml, xmm2, xmm3/mem128 
VADDSUBPD ymml, ymm2, ymm3/mem256 


Description 

Adds a value in the upper 64 bits of xmml to the 
corresponding value in xmm2 and writes the result to 
the upper 64 bits of xmml] subtracts the value in the 
lower 64 bits of xmml from the corresponding value 
in xmm2 and writes the result to the lower 64 bits of 
xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.01 

DO/r 

C4 

RXB.00001 

X.src.1.01 

DO/r 
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Related Instructions 

(V)ADDSUBPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ADDSUBPS Alternating Addition and Subtraction 

VADDSUBPS Packed Single-Precision Floating Point 

Adds the second and fourth single-precision floating-point values of the first source operand to the 
corresponding values of the second source operand and writes the sums to the second and fourth ele¬ 
ments of the destination. Subtracts the first and third single-precision floating-point values of the sec¬ 
ond source operand from the corresponding values of the first source operand and writes the 
differences to the first and third elements of the destination. 

There are legacy and extended forms of the instruction: 

ADDSUBPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VADDSUBPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ADDSUBPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VADDSUBPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 


Mnemonic Opcode 

ADDSUBPS xmml, xmm2/mem128 F2 OF DO /r 


Mnemonic 

VADDSUBPS xmml, xmm2, xmm3/mem128 
VADDSUBPS ymml, ymm2, ymm3/mem256 


Description 

Adds the second and fourth packed single-precision 
values in xmm2 or mem128 to the corresponding 
values in xmml and writes results to the 
corresponding positions of xmml. Subtracts the first 
and third packed single-precision values in xmm2 or 
mem128 from the corresponding values in xmml and 
writes results to the corresponding positions of xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.sre.0.11 

DO/r 

C4 

RXB.00001 

X.src.1.11 

DO/r 
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Related Instructions 

(V)ADDSUBPD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESDEC AES 

VAESDEC Decryption Round 

Performs a single round of AES decryption. Transforms a state value specified by the first source 
operand using a round key value specified by the second source operand, and writes the result to the 
destination. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

Decryption consists of 1, ..., N r — 1 iterations of sequences of operations called rounds, tenninated by 
a unique final round, N r . The AESDEC and VAESDEC instructions perform all the rounds except the 
last; the AESDECLAST and VAESDECLAST instructions perform the final round. 

The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 
matrix of bytes.The transformed state is written to the destination in column-major order. For both 
instructions, the destination register is the same as the first source register. 

There are legacy and extended forms of the instruction: 

AESDEC 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESDEC 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESDEC 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESDEC 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode 

AESDEC xmml, xmm2/mem128 66 OF 38 DE /r 

Mnemonic 

VAESDEC xmml, xmm2, xmm3/mem128 

Related Instructions 

(V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST 


Description 

Performs one decryption round on a state value 
in xmml using the key value in xmm2 or 
mem128. Writes results to xmml. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.00010 X.src.0.01 DE/r 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESDECLAST AES 

VAESDECLAST Last Decryption Round 

Performs the final round of AES decryption. Completes transformation of a state value specified by 
the first source operand using a round key value specified by the second source operand, and writes 
the result to the destination. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

Decryption consists of 1, ..., N r - 1 iterations of sequences of operations called rounds, tenninated by 
a unique final round, N r .The AESDEC and VAESDEC instructions perform all the rounds before the 
final round; the AESDECLAST and VAESDECLAST instructions perfonn the final round. 

The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 
matrix of bytes.The transformed state is written to the destination in column-major order. For both 
instructions, the destination register is the same as the first source register. 

There are legacy and extended forms of the instruction: 

AESDECLAST 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESDECLAST 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESDECLAST 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESDECLAST 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

AESDECLAST xmml, xmm2/mem128 66 OF 38 DF/r Performs the last decryption round on a state 

value in xmml using the key value in xmm2 or 
mem128. Writes results to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VAESDECLAST xmml, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DF/r 

Related Instructions 

(V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST 


Instruction Reference 


AESDECLAST, VAESDECLAST 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESENC AES 

VAESENC Encryption Round 

Performs a single round of AES encryption. Transforms a state value specified by the first source 
operand using a round key value specified by the second source operand, and writes the result to the 
destination. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

Encryption consists of 1, ..., N r — 1 iterations of sequences of operations called rounds, terminated by 
a unique final round, N r . The AESENC and VAESENC instructions perform all the rounds before the 
final round; the AESENCLAST and VAESENCLAST instructions perfonn the final round. 

The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 
matrix of bytes.The transformed state is written to the destination in column-major order. For both 
instructions, the destination register is the same as the first source register 

There are legacy and extended forms of the instruction: 

AESENC 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESENC 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESENC 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESENC 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

AESENC xmml, xmm2/mem128 66 OF 38 DC /r 

Mnemonic 

VAESENC xmml, xmm2, xmm3/mem128 

Related Instructions 

(V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST 


Description 

Performs one encryption round on a state value 
in xmml using the key value in xmm2 or 
mem128. Writes results to xmml. 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.00010 X.src.0.01 DC/r 


Instruction Reference 


AESENC, VAESENC 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESENCLAST AES 

VAESENCLAST Last Encryption Round 

Performs the final round of AES encryption. Completes transformation of a state value specified by 
the first source operand using a round key value specified by the second source operand, and writes 
the result to the destination. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

Encryption consists of 1, ..N r — 1 iterations of sequences of operations called rounds, terminated by 
a unique final round, N r . The AESENC and VAESENC instructions perform all the rounds before the 
final round; the AESENCLAST and VAESENCLAST instructions perfonn the final round. 

The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 
matrix of bytes.The transformed state is written to the destination in column-major order. For both 
instructions, the destination register is the same as the first source register. 

There are legacy and extended forms of the instruction: 

AESENCLAST 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESENCLAST 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESENCLAST 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESENCLAST 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

AESENCLAST xmml, xmm2/mem128 66 OF 38 DD /r Performs the last encryption round on a 

state value in xmml using the key value in xmm2 
or mem128. Writes results to xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VAESENCLAST xmml, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DD/r 

Related Instructions 

(V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST 


Instruction Reference 


AESENCLAST, VAESENCLAST 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESIMC AES 

VAESIMC InvMixColumn Transformation 

Applies the AES InvMixColumns( ) transformation to expanded round keys in preparation for decryp¬ 
tion. Transforms an expanded key specified by the second source operand and writes the result to a 
destination register. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

The 128-bit round key vector is interpreted as 16-byte column-major entries in a 4-by-4 matrix of 
bytes.The transformed result is written to the destination in column-major order. 

AESIMC and VAESIMC are not used to transform the first and last round key in a decryption 
sequence. 

There are legacy and extended forms of the instruction: 

AESIMC 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESIMC 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESIMC 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESIMC 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

AESIMC xmml, xmm2/mem128 66 OF 38 DB /r Performs AES InvMixColumn transformation on 

a round key in the xmm2 or mem128 and stores 
the result in xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VAESIMC xmml, xmm2/mem128 C4 RXB.00010 X.src.0.01 DB/r 

Related Instructions 

(V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST, (V)AESKEYGENASSIST 

rFLAGS Affected 

None 


Instruction Reference 


AESIMC, VAESIMC 


43 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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AESKEYGENASSIST AES 

VAESKEYGENASSIST Assist Round Key Generation 

Expands a round key for encryption. Transforms a 128-bit round key operand using an 8-bit round 
constant and writes the result to a destination register. 

See Appendix A on page 973 for more information about the operation of the AES instructions. 

The round key is provided by the second source operand and the round constant is specified by an 
immediate operand. The 128-bit round key vector is interpreted as 16-byte column-major entries in a 
4-by-4 matrix of bytes. The transformed result is written to the destination in column-major order. 

There are legacy and extended fonns of the instruction: 

AESKEYGENASSIST 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VAESKEYGENASSIST 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

AESKEYGENASSIST 

AES 

CPUID Fn0000_0001_ECX[AES] (bit 25) 

VAESKEYGENASSIST 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

AESKEYGENASSIST xmml, xmm2/mem128, imm8 

Mnemonic 

AESKEYGENASSIST xmml, xmm2 /mem128, imm8 

Related Instructions 


Opcode Description 

66 OF 3A DF /r ib Expands a round key in xmm2 or 
mem128 using an immediate 
round constant. Writes the result 
to xmml. 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.00011 X.src.0.01 DF/r ib 


(V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST,(V)AESIMC 

rFLAGS Affected 

None 


Instruction Reference AESKEYGENASSIST, VAESKEYGENASSIST 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ANDNPD AND NOT 

VANDNPD Packed Double-Precision Floating-Point 

Performs a bitwise AND of two packed double-precision floating-point values in the second source 
operand with the ones’-complement of the two corresponding packed double-precision floating-point 
values in the first source operand and writes the result into the destination. 

There are legacy and extended forms of the instruction: 

ANDNPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VANDNPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ANDNPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VANDNPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ANDNPD xmml, xmm2/mem128 66 OF 55 /r Performs bitwise AND of two packed double-precision 

floating-point values in xmm2 or mem128 with the ones’- 
complement of two packed double-precision floating¬ 
point values in xmml. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VANDNPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

55 /r 

VANDNPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

55 /r 


Related Instructions 

(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS 


Instruction Reference 


ANDNPD, VANDNPD 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ANDNPS AND NOT 

VANDNPS Packed Single-Precision Floating-Point 

Performs a bitwise AND of four packed single-precision floating-point values in the second source 
operand with the ones’-complement of the four corresponding packed single-precision floating-point 
values in the first source operand, and writes the result in the destination. 

There are legacy and extended forms of the instruction: 

ANDNPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VANDNPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ANDNPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VANDNPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ANDNPS xmml, xmm2/mem128 OF 55 /r Performs bitwise AND of four packed single-precision 

floating-point values in xmm2 or mem128 with the ones’- 
complement of four packed single-precision floating-point 
values in xmml. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VANDNPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

55 It 

VANDNPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

55 It 


Related Instructions 

(V)ANDNPD, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS 


Instruction Reference 


ANDNPS, VANDNPS 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ANDPD AND 

VANDPD Packed Double-Precision Floating-Point 

Performs bitwise AND of two packed double-precision floating-point values in the first source oper¬ 
and with the corresponding two packed double-precision floating-point values in the second source 
operand and writes the results into the corresponding elements of the destination. 

There are legacy and extended forms of the instruction: 

ANDPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VANDPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ANDPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VANDPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ANDPD xmml, xmm2/mem128 66 OF 54 /r Performs bitwise AND of two packed double-precision 

floating-point values in xmml with corresponding values in 
xmm2 or mem128. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VANDPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

54 /r 

VANDPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

54 /r 


Related Instructions 

(V)ANDNPD, (V)ANDNPS, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS 


Instruction Reference 


ANDPD, VANDPD 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ANDPS AND 

VANDPS Packed Single-Precision Floating-Point 

Performs bitwise AND of the four packed single-precision floating-point values in the first source 
operand with the corresponding four packed single-precision floating-point values in the second 
source operand, and writes the result into the corresponding elements of the destination. 

There are legacy and extended forms of the instruction: 

ANDPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VANDPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ANDPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VANDPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

ANDPS xmml, xmm2/mem128 OF 54 /r 

Mnemonic 

VANDPS xmml, xmm2, xmm3/mem128 
VANDPS ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS 


Description 

Performs bitwise AND of four packed single-precision floating¬ 
point values in xmml with corresponding values in xmm2 or 
mem128. Writes the result to xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.00 

54 It 

C4 

RXB.00001 

X.src.1.00 

54 /r 


Instruction Reference 


ANDPS, VANDPS 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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BLENDPD Blend 

VBLENDPD Packed Double-Precision Floating-Point 

Copies packed double-precision floating-point values from either of two sources to a destination, as 
specified by an 8-bit mask operand. 

Each mask bit specifies a 64-bit element in a source location and a corresponding 64-bit element in 
the destination register. When a mask bit = 0, the specified element of the first source is copied to the 
corresponding position in the destination register. When a mask bit = 1, the specified element of the 
second source is copied to the corresponding position in the destination register. 

There are legacy and extended fonns of the instruction: 

BLENDPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. Only mask bits [1:0] are used. 

VBLENDPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. Only mask bits [1:0] are used. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. Only mask bits [3:0] are used. 

instruction Support 


Form 

Subset 

Feature Flag 

BLENDPD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VBLENDPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

BLENDPD xmml, xmm2/mem128, imm8 

Mnemonic 

VBLENDPD xmml, xmm2, xmm3/mem128, imm8 
VBLENDPD ymml, ymm2, ymm3/mem256, imm8 


Opcode Description 

66 OF 3A 0D /r ib Copies values from xmml or 
xmm2/mem128 to xmml, as 
specified by imm8. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00011 

X.src.0.01 

0D /r ib 

C4 

RXB.00011 

X.src.1.01 

0D /r ib 


Instruction Reference 


BLENDPD, VBLENDPD 
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Related Instructions 

(V)BLENDPS, (B)BLENDVPD, (V)BLENDVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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BLENDPS Blend 

VBLENDPS Packed Single-Precision Floating-Point 

Copies packed single-precision floating-point values from either of two sources to a destination, as 
specified by an 8-bit mask operand. 

Each mask bit specifies a 32-bit element in a source location and a corresponding 32-bit element in 
the destination register. When a mask bit = 0, the specified element of the first source is copied to the 
corresponding position in the destination register. When a mask bit = 1, the specified element of the 
second source is copied to the corresponding position in the destination register. 

There are legacy and extended fonns of the instruction: 

BLENDPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. Only mask bits [3:0] are used. 

VBLENDPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared.Only mask bits [3:0] are used. 

YMM Encoding 

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit 
memory location. The destination is a third YMM register. All 8 bits of the mask are used. 

Instruction Support 


Form 

Subset 

Feature Flag 

BLENDPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VBLENDPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

BLENDPS xmml, xmm2/mem128, imm8 

Mnemonic 

VBLENDPS xmml, xmm2, xmm3/mem128, imm8 
VBLENDPS ymml, ymm2, ymm3/mem256, imm8 


Opcode Description 

66 OF 3A 0C /r ib Copies values from xmml or 

xmm2/mem128 to xmml, as 
specified by imm8. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00011 

X.src.0.01 

0C /r ib 

C4 

RXB.00011 

X.src.1.01 

0C /r ib 


Instruction Reference 


BLENDPS, VBLENDPS 


57 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Related Instructions 

(V)BLENDPD, (V)BLENDVPD, (V)BLENDVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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BLENDVPD Variable Blend 

VBLENDVPD Packed Double-Precision Floating-Point 

Copies packed double-precision floating-point values from either of two sources to a destination, as 
specified by a mask operand. 

Each mask bit specifies a 64-bit element of a source location and a corresponding 64-bit element of 
the destination. The position of a mask bit corresponds to the position of the most significant bit of a 
copied value. When a mask bit = 0, the specified element of the first source is copied to the corre¬ 
sponding position in the destination. When a mask bit = 1, the specified element of the second source 
is copied to the corresponding position in the destination. 

There are legacy and extended forms of the instruction: 

BLENDVPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. The mask is defined by bits 127 
and 63 of the implicit register XMMO. 

VBLENDVPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. The mask is defined by bits 127 and 63 of a fourth 
XMM register. 

YMM Encoding 

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit 
memory location. The destination is a third YMM register. The mask is defined by bits 255, 191, 127, 
and 63 of a fourth YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

BLENDVPD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VBLENDVPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Reference 


BLENDVPD, VBLENDVPD 
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Instruction Encoding 

Mnemonic Opcode 

BLENDVPD xmml, xmm2/mem128 66 OF 38 15 /r 

Mnemonic 

VBLENDVPD xmml, xmm2, xmm3/mem128, xmm4 
VBLENDVPD ymml, ymm2, ymm3/mem256, ymm4 

Related Instructions 

(V)BLENDPD, (V)BLENDPS, (V)BLENDVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Description 

Copies values from xmml or xmm2/mem128 to 
xmml, as specified by the MSB of corresponding 
elements of xmmO. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00011 

X.src.0.01 

4B /r 

C4 

RXB.00011 

X.src.1.01 

4B /r 
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BLENDVPS Variable Blend 

VBLENDVPS Packed Single-Precision Floating-Point 

Copies packed single-precision floating-point values from either of two sources to a destination, as 
specified by a mask operand. 

Each mask bit specifies a 32-bit element of a source location and a corresponding 32-bit element of 
the destination register. The position of a mask bits corresponds to the position of the most significant 
bit of a copied value. When a mask bit = 0, the specified element of the first source is copied to the 
corresponding position in the destination. When a mask bit = 1, the specified element of the second 
source is copied to the corresponding position in the destination. 

There are legacy and extended forms of the instruction: 

BLENDVPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. The mask is defined by bits 127, 
95, 63, and 31 of the implicit register XMMO. 

VBLENDVPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. The mask is defined by bits 127, 95, 63, and 31 of 
a fourth XMM register. 

YMM Encoding 

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit 
memory location. The destination is a third YMM register. The mask is defined by bits 255, 223, 191, 
159, 127, 95, 63, and 31 of a fourth YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

BLENDVPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VBLENDVPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Reference 
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Instruction Encoding 

Mnemonic Opcode Description 

BLENDVPS xmml, xmm2/mem128 66 OF 38 14 /r Copies packed single-precision 

floating-point values from xmml or 
xmm2lmem128 to xmml, as 
specified by bits in xmmO. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VBLENDVPS xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.00011 

X.src.0.01 

4A /r 

VBLENDVPS ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.00011 

X.src.1.01 

4A /r 


Related Instructions 

(V)BLENDPD, (V)BLENDPS, (V)BLENDVPD 


rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


62 


BLENDVPS, VBLENDVPS 


Instruction Reference 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


CMPPD Compare 

VCMPPD Packed Double-Precision Floating-Point 

Compares each of the two packed double-precision floating-point values of the first source operand to 
the corresponding values of the second source operand and writes the result of each comparison to the 
corresponding 64-bit element of the destination. When a comparison is TRUE, all 64 bits of the desti¬ 
nation element are set; when a comparison is FALSE, all 64 bits of the destination element are 
cleared. The type of comparison is specified by an immediate byte operand. 

Signed comparisons return TRUE only when both operands are valid numbers and the numbers have 
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when 
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison 
returns TRUE only when one or both operands are NaN and FALSE otherwise. 

QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn’t Equal, 
Unequal, Ordered, or Unordered. SNaN operands always generate an IE. 

There are legacy and extended forms of the instruction: 

CMPPD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or al28-bit memory location.The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. Comparison type is specified by 
bits [2:0] of an immediate byte operand. 

VCMPPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an 
immediate byte operand. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. Comparison type is speci¬ 
fied by bits [4:0] of an immediate byte operand. 

Immediate Operand Encoding 

CMPPD uses bits [2:0] of the 8-bit immediate operand and VCMPPD uses bits [4:0] of the 8-bit 
immediate operand. Although VCMPPD supports 20h encoding values, the comparison types echo 
those of CMPPD on 4-bit boundaries. The following table shows the immediate operand value for 
CMPPD and each of the VCMPPD echoes. 

Some comparison operations that are not directly supported by immediate-byte encodings can be 
implemented by swapping the contents of the source and destination operands and executing the 
appropriate comparison of the swapped values. These additional comparison operations are shown 
with the directly supported comparison operations. 
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Immediate Operand 
Value 

Compare Operation 

Result If NaN Operand 

QNaN Operand Causes 
Invalid Operation 
Exception 

OOh, 08h, lOh, 18h 

Equal 

FALSE 

No 

01 h, 09h, 11 h, 19h 

Less than 

FALSE 

Yes 


Greater than 
(swapped operands) 

FALSE 

Yes 

02h, OAh, 12h, 1Ah 

Less than or equal 

FALSE 

Yes 


Greater than or equal 
(swapped operands) 

FALSE 

Yes 

03h, OBh, 13h, IBh 

Unordered 

TRUE 

No 

04h, OCh, 14h, ICh 

Not equal 

TRUE 

No 

05h, ODh, 15h, IDh 

Not less than 

TRUE 

Yes 


Not greater than 
(swapped operands) 

TRUE 

Yes 

06h, OEh, 16h, 1Eh 

Not less than or equal 

TRUE 

Yes 


Not greater than or equal 
(swapped operands) 

TRUE 

Yes 

07h, OFh, 17h, IFh 

Ordered 

FALSE 

No 

The following alias mnemonics for (V)CMPPD with appropriate value of inimS are supported. 


Mnemonic 

Implied Value of imm8 

(V)CMPEQPD 

OOh, 08h, lOh, 18h 

(V)CMPLTPD 

Olh, 09h, 11 h, 19h 

(V)CMPLEPD 

02h, OAh, 12h, 1Ah 

(V)CMPUNORDPD 

03h, OBh, 13h, IBh 

(V)CMPNEQPD 

04h, OCh, 14h, ICh 

(V)CMPNLTPD 

05h, ODh, 15h, IDh 

(V)CMPNLEPD 

06h, OEh, 16h, 1Eh 

(V)CMPORDPD 

07h, OFh, 17h, IFh 


Instruction Support 


Form 

Subset 

Feature Flag 

CMPPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCMPPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CMPPD xmml, xmm2/mem128, imm8 66 OF C2 /r ib Compares two pairs of values in xmml to 

corresponding values in xmm2 or mem128. 
Comparison type is determined by imm8. 
Writes comparison results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCMPPD xmml, xmm2, xmm3/mem128, imm8 

C4 

RXB.00001 

X.src.0.01 

C2 /r ib 

VCMPPD ymml, ymm2, ymm3/mem256, imm8 

C4 

RXB.00001 

X.src.1.01 

C2 /r ib 


Related Instructions 

(V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CMPPS Compare 

VCMPPS Packed Single-Precision Floating-Point 

Compares each of the four packed single-precision floating-point values of the first source operand to 
the corresponding values of the second source operand and writes the result of each comparison to the 
corresponding 32-bit element of the destination. When a comparison is TRUE, all 32 bits of the desti¬ 
nation element are set; when a comparison is FALSE, all 32 bits of the destination element are 
cleared. The type of comparison is specified by an immediate byte operand. 

Signed comparisons return TRUE only when both operands are valid numbers and the numbers have 
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when 
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison 
returns TRUE only when one or both operands are NaN and FALSE otherwise. 

QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn’t Equal, 
Unequal, Ordered, or Unordered. SNaN operands always generate an IE. 

There are legacy and extended forms of the instruction: 

CMPPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. Comparison type is specified by 
bits [2:0] of an immediate byte operand. 

VCMPPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an 
immediate byte operand. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. Comparison type is speci¬ 
fied by bits [4:0] of an immediate byte operand. 

Immediate Operand Encoding 

CMPPS uses bits [2:0] of the 8-bit immediate operand and VCMPPS uses bits [4:0] of the 8-bit 
immediate operand. Although VCMPPS supports 20h encoding values, the comparison types echo 
those of CMPPS on 4-bit boundaries. The following table shows the immediate operand value for 
CMPPS and each of the VCMPPDS echoes. 

Some comparison operations that are not directly supported by immediate-byte encodings can be 
implemented by swapping the contents of the source and destination operands and executing the 
appropriate comparison of the swapped values. These additional comparison operations are shown in 
with the directly supported comparison operations. 
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Immediate Operand 
Value 

Compare Operation 

Result If NaN Operand 

QNaN Operand Causes 
Invalid Operation 
Exception 

OOh, 08h, lOh, 18h 

Equal 

FALSE 

No 

01 h, 09h, 11 h, 19h 

Less than 

FALSE 

Yes 


Greater than 
(swapped operands) 

FALSE 

Yes 

02h, OAh, 12h, 1Ah 

Less than or equal 

FALSE 

Yes 


Greater than or equal 
(swapped operands) 

FALSE 

Yes 

03h, OBh, 13h, IBh 

Unordered 

TRUE 

No 

04h, OCh, 14h, ICh 

Not equal 

TRUE 

No 

05h, ODh, 15h, IDh 

Not less than 

TRUE 

Yes 


Not greater than 
(swapped operands) 

TRUE 

Yes 

06h, OEh, 16h, 1Eh 

Not less than or equal 

TRUE 

Yes 


Not greater than or equal 
(swapped operands) 

TRUE 

Yes 

07h, OFh, 17h, IFh 

Ordered 

FALSE 

No 

The following alias mnemonics for (V)CMPPS with appropriate value of imm8 are supported. 


Mnemonic 

Implied Value of imm8 

(V)CMPEQPS 

OOh, 08h, 10h, 18h 

(V)CMPLTPS 

Olh, 09h, 11 h, 19h 

(V)CMPLEPS 

02h, OAh, 12h, 1Ah 

(V)CMPUNORDPS 

03h, OBh, 13h, IBh 

(V)CMPNEQPS 

04h, OCh, 14h, ICh 

(V)CMPNLTPS 

05h, ODh, 15h, IDh 

(V)CMPNLEPS 

06h, OEh, 16h, 1Eh 

(V)CMPORDPS 

07h, OFh, 17h, IFh 


Instruction Support 


Form 

Subset 

Feature Flag 

CMPPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCMPPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode Description 

CMPPS xmml, xmm2/mem128, imm8 OF C2 /r ib Compares four pairs of values in xmml to 

corresponding values in xmm2 or mem128. 
Comparison type is determined by imm8. 
Writes comparison results to xmml. 


Mnemonic 

VCMPPS xmml , xmm2, xmm3/mem128, imm8 


VEX 

C4 


Encoding 

RXB.map_select W.vvvv.L.pp 

RXB.00001 X.src.0.00 


Opcode 

C2 /r ib 


Related Instructions 

(V)CMPPD, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


70 


CMPPS, VCMPPS 


Instruction Reference 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


CMPSD Compare 

VCMPSD Scalar Double-Precision Floating-Point 

Compares a double-precision floating-point value in the low-order 64 bits of the first source operand 
with a double-precision floating-point value in the low-order 64 bits of the second source operand and 
writes the result to the low-order 64 bits of the destination. When a comparison is TRUE, all 64 bits 
of the destination element are set; when a comparison is FALSE, all 64 bits of the destination element 
are cleared. Comparison type is specified by an immediate byte operand. 

Signed comparisons return TRUE only when both operands are valid numbers and the numbers have 
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when 
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison 
returns TRUE only when one or both operands are NaN and FALSE otherwise. 

QNaN operands generate an Invalid Operation Exception (IE) only when the comparison type is not 
Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE. 

There are legacy and extended forms of the instruction: 

CMPSD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destina¬ 
tion are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. Comparison type is specified by bits [2:0] of an immediate byte operand. 

This CMPSD instruction must not be confused with the same-mnemonic CMPSD (compare strings 
by doubleword) instruction in the general-purpose instruction set. Assemblers can distinguish the 
instructions by the number and type of operands. 

VCMPSD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the destination 
are copied from bits [127:64] of the first source. Bits [255:128] of the YMM register that corresponds 
to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte oper¬ 
and. 

Immediate Operand Encoding 

CMPSD uses bits [2:0] of the 8-bit immediate operand and VCMPSD uses bits [4:0] of the 8-bit 
immediate operand. Although VCMPSD supports 20h encoding values, the comparison types echo 
those of CMPSD on 4-bit boundaries. The following table shows the immediate operand value for 
CMPSD and each of the VCMPSD echoes. 

Some comparison operations that are not directly supported by immediate-byte encodings can be 
implemented by swapping the contents of the source and destination operands and executing the 
appropriate comparison of the swapped values. These additional comparison operations are shown 
with the directly supported comparison operations. When operands are swapped, the first source 
XMM register is overwritten by the result. 
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Immediate Operand 
Value 

Compare Operation 

Result If NaN Operand 

QNaN Operand Causes 
Invalid Operation 
Exception 

OOh, 08h, lOh, 18h 

Equal 

FALSE 

No 

01 h, 09h, 11 h, 19h 

Less than 

FALSE 

Yes 


Greater than 
(swapped operands) 

FALSE 

Yes 

02h, OAh, 12h, 1Ah 

Less than or equal 

FALSE 

Yes 


Greater than or equal 
(swapped operands) 

FALSE 

Yes 

03h, OBh, 13h, IBh 

Unordered 

TRUE 

No 

04h, OCh, 14h, ICh 

Not equal 

TRUE 

No 

05h, ODh, 15h, IDh 

Not less than 

TRUE 

Yes 


Not greater than 
(swapped operands) 

TRUE 

Yes 

06h, OEh, 16h, 1Eh 

Not less than or equal 

TRUE 

Yes 


Not greater than or equal 
(swapped operands) 

TRUE 

Yes 

07h, OFh, 17h, IFh 

Ordered 

FALSE 

No 

The following alias mnemonics for (V)CMPSD with appropriate value of inimS are supported. 


Mnemonic 

Implied Value of imm8 

(V)CMPEQSD 

OOh, 08h, lOh, 18h 

(V)CMPLTSD 

Olh, 09h, 11 h, 19h 

(V)CMPLESD 

02h, OAh, 12h, 1Ah 

(V)CMPUNORDSD 

03h, OBh, 13h, IBh 

(V)CMPNEQSD 

04h, OCh, 14h, ICh 

(V)CMPNLTSD 

05h, ODh, 15h, IDh 

(V)CMPNLESD 

06h, OEh, 16h, 1Eh 

(V)CMPORDSD 

07h, OFh, 17h, IFh 


Instruction Support 


Form 

Subset 

Feature Flag 

CMPSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCMPSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

CMPSD xmml, xmm2/mem64 , imm8 


Mnemonic 


Opcode Description 

F2 OF C2 /r ib Compares double-precision floating-point 

values in the low-order 64 bits of xmml with 
corresponding values in xmm2 or mem64. 
Comparison type is determined by imm8. 
Writes comparison results to xmml. 

Encoding 


VCMPSD xmml, xmm2, xmm3/mem64, imm8 


VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.X.11 C2/r ib 


Related Instructions 

(V)CMPPD, (V)CMPPS, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CMPSS Compare 

VCMPSS Scalar Single-Precision Floating-Point 

Compares a single-precision floating-point value in the low-order 32 bits of the first source operand 
with a single-precision floating-point value in the low-order 32 bits of the second source operand and 
writes the result to the low-order 32 bits of the destination. When a comparison is TRUE, all 32 bits 
of the destination element are set; when a comparison is FALSE, all 32 bits of the destination element 
are cleared. Comparison type is specified by an immediate byte operand. 

Signed comparisons return TRUE only when both operands are valid numbers and the numbers have 
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when 
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison 
returns TRUE only when one or both operands are NaN and FALSE otherwise. 

QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn’t Equal, 
Unequal, Ordered, or Unordered. SNaN operands always generate an IE. 

There are legacy and extended forms of the instruction: 

CMPSS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destina¬ 
tion are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. Comparison type is specified by bits [2:0] of an immediate byte operand. 

VCMPSS 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the destination 
are copied from bits [127L32] of the first source. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte 
operand. 

Immediate Operand Encoding 

CMPSS uses bits [2:0] of the 8-bit immediate operand and VCMPSS uses bits [4:0] of the 8-bit 
immediate operand. Although VCMPSS supports 20h encoding values, the comparison types echo 
those of CMPSS on 4-bit boundaries. The following table shows the immediate operand value for 
CMPSS and each of the VCMPSS echoes. 

Some comparison operations that are not directly supported by immediate-byte encodings can be 
implemented by swapping the contents of the source and destination operands and executing the 
appropriate comparison of the swapped values. These additional comparison operations are shown 
below with the directly supported comparison operations. When operands are swapped, the first 
source XMM register is overwritten by the result. 
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Immediate Operand 
Value 

Compare Operation 

Result If NaN Operand 

QNaN Operand Causes 
Invalid Operation 
Exception 

OOh, 08h, lOh, 18h 

Equal 

FALSE 

No 

01 h, 09h, 11 h, 19h 

Less than 

FALSE 

Yes 


Greater than 
(swapped operands) 

FALSE 

Yes 

02h, OAh, 12h, 1Ah 

Less than or equal 

FALSE 

Yes 


Greater than or equal 
(swapped operands) 

FALSE 

Yes 

03h, OBh, 13h, IBh 

Unordered 

TRUE 

No 

04h, OCh, 14h, ICh 

Not equal 

TRUE 

No 

05h, ODh, 15h, IDh 

Not less than 

TRUE 

Yes 


Not greater than 
(swapped operands) 

TRUE 

Yes 

06h, OEh, 16h, 1Eh 

Not less than or equal 

TRUE 

Yes 


Not greater than or equal 
(swapped operands) 

TRUE 

Yes 

07h, OFh, 17h, IFh 

Ordered 

FALSE 

No 

The following alias mnemonics for (V)CMPSS with appropriate value of imm8 are supported. 


Mnemonic 

Implied Value of imm8 

(V)CMPEQSS 

OOh, 08h, 10h, 18h 

(V)CMPLTSS 

Olh, 09h, 11 h, 19h 

(V)CMPLESS 

02h, OAh, 12h, 1Ah 

(V)CMPUNORDSS 

03h, OBh, 13h, IBh 

(V)CMPNEQSS 

04h, OCh, 14h, ICh 

(V)CMPNLTSS 

05h, ODh, 15h, IDh 

(V)CMPNLESS 

06h, OEh, 16h, 1Eh 

(V)CMPORDSS 

07h, OFh, 17h, IFh 


Instruction Support 


Form 

Subset 

Feature Flag 

CMPSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCMPSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CMPSS xmml, xmm2/mem32, imm8 F3 OF C2 /r ib Compares single-precision floating-point 

values in the low-order 32 bits of xmml with 
corresponding values in xmm2 or mem32. 
Comparison type is determined by imm8. 
Writes comparison results to xmml. 

Mnemonic Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

VCMPSS xmml, xmm2, xmm3/mem32, imm8 C4 RXB.00001 X.src.X.10 C2/rib 

Related Instructions 

(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Instruction Reference 


CMPSS, VCMPSS 


77 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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COMISD Compare Ordered 

VCOMISD Scalar Double-Precision Floating-Point 

Compares a double-precision floating-point value in the low-order 64 bits of the first operand with a 
double-precision floating-point value in the low-order 64 bits of the second operand and sets 
rFLAGS.ZF, PF, and CF to show the result of the comparison: 


Comparison 

ZF 

PF 

CF 

NaN input 

1 

1 

1 

operand 1 > operand 2 

0 

0 

0 

operand 1 < operand 2 

0 

0 

1 

operand 1 == operand 2 

1 

0 

0 


The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF 
bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated. 

There are legacy and extended forms of the instruction: 

COMISD 

The first source operand is an XMM register and the second source operand is an XMM register or a 
64-bit memory location. 

VCOMISD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

COMISD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCOMISD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen- 


dix E of Volume 3. 



Instruction Encoding 

Mnemonic 

Opcode 

Description 

COMISD xmml, xmm2/mem64 

66 OF 2F /r 

Compares double-precision floating-point values in xmml 
with corresponding values in xmm2 or mem64 and sets 
rFLAGS. 

Mnemonic 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VCOMISD xmml, xmm2 Imem64 

Related Instructions 


C4 RXB.00001 X.src.X.01 2F/r 


(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISS, (V)UCOMISD, (V)UCOMISS 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




0 

M 

0 

M 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 

Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated. 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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COMISS Compare 

VCOMISS Ordered Scalar Single-Precision Floating-Point 

Compares a double-precision floating-point value in the low-order 32 bits of the first operand with a 
single-precision floating-point value in the low-order 32 bits of the second operand and sets 
rFLAGS.ZF, PF, and CF to show the result of the comparison: 


Comparison 

ZF 

PF 

CF 

NaN input 

1 

1 

1 

operand 1 > operand 2 

0 

0 

0 

operand 1 < operand 2 

0 

0 

1 

operand 1 == operand 2 

1 

0 

0 


The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF 
bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated. 

There are legacy and extended forms of the instruction: 

COMISS 

The first source operand is an XMM register and the second source operand is an XMM register or a 
32-bit memory location. 

VCOMISS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

COMISS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCOMISS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

COMISS xmml, xmm2/mem32 OF 2F /r Compares single-precision floating-point values in xmml 

with corresponding values in xmm2 or mem32 and sets 
rFLAGS. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VCOMISS xmml, xmm2 Imem32 C4 RXB.00001 X.src.X.OO 2F/r 

Related Instructions 

(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)UCOMISD, (V)UCOMISS 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




0 

M 

0 

M 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 

Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated. 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTDQ2PD Convert Packed Doubleword Integers 

VCVTDQ2PD to Packed Double-Precision Floating-Point 

Converts packed 32-bit signed integer values to packed double-precision floating-point values and 
writes the converted values to the destination. 

There are legacy and extended fonns of the instruction: 

CVTDQ2PD 

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a 
64-bit memory location to two packed double-precision floating-point values and writes the con¬ 
verted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the desti¬ 
nation are not affected. 

VCVTDQ2PD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a 
64-bit memory location to two packed double-precision floating-point values and writes the con¬ 
verted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the desti¬ 
nation are cleared. 

YMM Encoding 

Converts four packed 32-bit signed integer values in the low-order 128 bits of a YMM register or a 
256-bit memory location to four packed double-precision floating-point values and writes the con¬ 
verted values to a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTDQ2PD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTDQ2PD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

CVTDQ2PD xmml, xmm2/mem64 F3 OF E6 /r Converts packed doubleword signed integers in xmm2 

or mem64 to double-precision floating-point values in 
xmml. 


Mnemonic 

VEX 

Encoding 

RXB.mapselect W.vvvv.L.pp 

Opcode 

VCVTDQ2PD xmml, xmm2/mem64 

C4 

RXB.00001 

X.1111.0.10 

E6/r 

VCVTDQ2PD ymml, ymm2/mem256 

C4 

RXB.00001 

X.1111.1.10 

E6/r 
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Related Instructions 

(V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ, 
(V)CVTTSD2SI 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference with alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTDQ2PS Convert Packed Doubleword Integers 

VCVTDQ2PS to Packed Single-Precision Floating-Point 

Converts packed 32-bit signed integer values to packed single-precision floating-point values and 
writes the converted values to the destination. When the result is an inexact value, it is rounded as 
specified by MXCSR.RC. 

There are legacy and extended fonns of the instruction: 

CVTDQ2PS 

Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location 
to four packed single-precision floating-point values and writes the converted values to an XMM reg¬ 
ister. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VCVTDQ2PS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location 
to four packed single-precision floating-point values and writes the converted values to an XMM reg¬ 
ister. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Converts eight packed 32-bit signed integer values in a YMM register or a 256-bit memory location 
to eight packed single-precision floating-point values and writes the converted values to a YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTDQ2PS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTDQ2PS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

CVTDQ2PS xmml, xmm2/mem128 OF 5B /r 

Mnemonic 

VCVTDQ2PS xmml, xmm2/mem128 
VCVTDQ2PS ymml, ymm2/mem256 

Related Instructions 


Description 

Converts packed doubleword integer values in xmm2 or 
mem128 to packed single-precision floating-point 
values in xmm2. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.00001 X.1111.0.00 5B/r 

C4 RXB.00001 X.1111.1.00 5B/r 


(V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI 
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rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 






17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


CVTDQ2PS, VCVTDQ2PS 
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CVTPD2DQ Convert Packed Double-Precision Floating-Point 
VCVTPD2DQ to Packed Doubleword Integer 

Converts packed double-precision floating-point values to packed signed doubleword integers and 
writes the converted values to the destination. 

When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floating¬ 
point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed dou¬ 
bleword (-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) 
when the invalid-operation exception (IE) is masked. 

There are legacy and extended fonns of the instruction: 

CVTPD2DQ 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed signed doubleword integers and writes the converted values to the two low- 
order doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VCVTPD2DQ 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two signed doubleword values and writes the converted values to the lower two double- 
word elements of the destination XMM register. Bits [127:64] of the destination are cleared. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory 
location to four signed doubleword values and writes the converted values to an XMM register. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTPD2DQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTPD2DQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

CVTPD2DQ xmml, xmm2/mem128 F2 OF E6 /r Converts two packed double-precision floating-point 

values in xmm2 or mem128 to packed doubleword 
integers in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTPD2DQ xmml, xmm2/mem128 

C4 

RXB.00001 

X.1111.0.11 

E6/r 

VCVTPD2DQ xmml, ymm2/mem256 

C4 

RXB.00001 

X.1111.1.11 

E6/r 
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Related Instructions 

(V)CVTDQ2PD, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ, 
(V)CVTTSD2SI 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


CVTPD2DQ, VCVTPD2DQ 
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CVTPD2PS Convert Packed Double-Precision Floating-Point 
VCVTPD2PS to Packed Single-Precision Floating-Point 

Converts packed double-precision floating-point values to packed single-precision floating-point val¬ 
ues and writes the converted values to the low-order doubleword elements of the destination. When 
the result is an inexact value, it is rounded as specified by MXCSR.RC. 

There are legacy and extended fonns of the instruction: 

CVTPD2PS 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed single-precision floating-point values and writes the converted values to an 
XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that 
corresponds to the destination are not affected. 

VCVTPD2PS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed single-precision floating-point values and writes the converted values to an 
XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

YMM Encoding 

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory 
location to four packed single-precision floating-point values and writes the converted values to a 
YMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTPD2PS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTPD2PS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

CVTPD2PS xmml, xmm2/mem128 66 OF 5A /r 

Mnemonic 

VCVTPD2PS xmml, xmm2/mem128 
VCVTPD2PS xmml, ymm2/mem256 


Description 

Converts packed double-precision floating-point 
values in xmm2 or mem128 to packed single¬ 
precision floating-point values in xmml. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.01 

5A /r 

C4 

RXB.00001 

X.1111.1.01 

5A /r 
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Related Instructions 

(V)CVTPS2PD, (V)CVTSD2SS, (V)CVTSS2SD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 
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CVTPS2DQ Convert Packed Single-Precision Floating-Point 
VCVTPS2DQ to Packed Doubleword Integers 

Converts packed single-precision floating-point values to packed signed doubleword integer values 
and writes the converted values to the destination. 

When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floating¬ 
point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed dou¬ 
bleword (-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) 
when the invalid-operation exception (IE) is masked. 

There are legacy and extended fonns of the instruction: 

CVTPS2DQ 

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory 
location to four packed signed doubleword integer values and writes the converted values to an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VCVTPS2DQ 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory 
location to four packed signed doubleword integer values and writes the converted values to an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory 
location to eight packed signed doubleword integer values and writes the converted values to a YMM 
register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTPS2DQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTPS2DQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 


Opcode 


Description 


CVTPS2DQ xmml, xmm2/mem128 66 OF 5B /r 

Mnemonic 

VCVTPS2DQ xmml, xmm2/mem128 
VCVTPS2DQ ymml, ymm2/mem256 


Converts four packed single-precision floating-point 
values in xmm2 or mem128 to four packed 
doubleword integers in xmml. 

Encoding 


VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.01 

5B/r 

C4 

RXB.00001 

X.1111.1.01 

5B/r 
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Related Instructions 

(V)CVTDQ2PS, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 
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CVTPS2PD Convert Packed Single-Precision Floating-Point 
VCVTPS2PD to Packed Double-Precision Floating-Point 

Converts packed single-precision floating-point values to packed double-precision floating-point val¬ 
ues and writes the converted values to the destination. 

There are legacy and extended forms of the instruction: 

CVTPS2PD 

Converts two packed single-precision floating-point values in the two low order doubleword ele¬ 
ments of an XMM register or a 64-bit memory location to two double-precision floating-point values 
and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are not affected. 

VCVTPS2PD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts two packed single-precision floating-point values in the two low order doubleword ele¬ 
ments of an XMM register or a 64-bit memory location to two double-precision floating-point values 
and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are cleared. 

YMM Encoding 

Converts four packed single-precision floating-point values in a YMM register or a 128-bit memory 
location to four double-precision floating-point values and writes the converted values to a YMM 
register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTPS2PD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTPS2PD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

CVTPS2PD xmml, xmm2/mem64 OF 5A /r Converts packed single-precision floating-point values 

in xmm2 or mem64 to packed double-precision floating¬ 
point values in xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VCVTPS2PD xmml, xmm2/mem64 

C4 

RXB.00001 

X.1111.0.00 

5A It 

VCVTPS2PD ymml, ymm2/mem128 

C4 

RXB. 00001 

X.1111.1.00 

5 A It 


Related Instructions 

(V)CVTPD2PS, (V)CVTSD2SS, (V)CVTSS2SD 
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rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 
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CVTSD2SI Convert Scalar Double-Precision Floating-Point 
VCVTSD2SI to Signed Doubleword or Quadword Integer 

Converts a scalar double-precision floating-point value to a 32-bit or 64-bit signed integer value and 
writes the converted value to a general-purpose register. 

When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floating¬ 
point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed dou¬ 
bleword (-2 31 to +2 31 - 1) or quadword value (-2 63 to +2 63 - 1), the instruction returns the indefinite 
integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) when the 
invalid-operation exception (IE) is masked. 

There are legacy and extended forms of the instruction: 

CVTSD2SI 

The legacy form has two encodings: 

• When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted 
value to a 32-bit general purpose register. 

• When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the 
converted value to a 64-bit general purpose register. 

VCVTSD2SI 

The extended fonn of the instruction has two 128-bit encodings: 

• When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted 
value to a 32-bit general purpose register. 

• When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the 
converted value to a 64-bit general purpose register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTSD2SI 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTSD2SI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CVTSD2SI reg32, xmm1lmem64 F2 (WO) OF 2D /r Converts a packed double-precision floating-point value 

in xmml or mem64 to a doubleword integer in reg32. 

CVTSD2SI reg64, xmm1lmem64 F2 (W1) OF 2D /r Converts a packed double-precision floating-point value 

in xmml or mem64 to a quadword integer in reg64. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTSD2SI reg32, xmm2/mem64 

C4 

RXB.00001 

0.1111.X.11 

2D It 

VCVTSD2SI reg64, xmm2/mem64 

C4 

RXB.00001 

1.1111.X.11 

2D It 


Related Instructions 

(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSI2SD, (V)CVTTPD2DQ, 
(V)CVTTSD2SI 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTSD2SS Convert Scalar Double-Precision Floating-Point 
VCVTSD2SS to Scalar Single-Precision Floating-Point 

Converts a scalar double-precision floating-point value to a scalar single-precision floating-point 
value and writes the converted value to the low-order 32 bits of the destination. When the result is an 
inexact value, it is rounded as specified by MXCSR.RC. 

There are legacy and extended forms of the instruction: 

CVTSD2SS 

Converts a scalar double-precision floating-point value in the low-order 64 bits of the second source 
XMM register or a 64-bit memory location to a scalar single-precision floating-point value and writes 
the converted value to the low-order 32 bits of a destination XMM register. Bits [127:32] of the desti¬ 
nation are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VCVTSD2SS 

The extended fonn of the instruction has a 128-bit encoding only. 

Converts a scalar double-precision floating-point value in the low-order 64 bits of a source XMM 
register or a 64-bit memory location to a scalar single-precision floating-point value and writes the 
converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the destina¬ 
tion are copied from the first source XMM register. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTSD2SS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTSD2SS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

CVTSD2SS xmml, xmm2/mem64 F2 OF 5A /r 


Mnemonic 

VCVTSD2SS xmml, xmm2, xmm3/mem64 


Description 

Converts a scalar double-precision floating-point 
value in xmm2 or mem64 to a scalar single-precision 
floating-point value in xmml. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.X.11 5A/r 


Related Instructions 

(V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSS2SD 

rFLAGS Affected 


None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTSI2SD Convert Signed Doubleword or Quadword Integer 
VCVTSI2SD to Scalar Double-Precision Floating-Point 

Converts a signed integer value to a double-precision floating-point value and writes the converted 

value to a destination register. When the result of the conversion is an inexact value, the value is 

rounded as specified by MXCSR.RC. 

There are legacy and extended forms of the instruction: 

CVTSI2SD 

The legacy fonn as two encodings: 

• When REX.W = 0, converts a signed doubleword integer value from a 32-bit source general- 
purpose register or a 32-bit memory location to a double-precision floating-point value and writes 
the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the destination 
XMM register and bits [255:128] of the corresponding YMM register are not affected. 

• When REX.W = 1, converts a a signed quadword integer value from a 64-bit source general- 
purpose register or a 64-bit memory location to a 64-bit double-precision floating-point value and 
writes the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the 
destination XMM register and bits [255:128] of the corresponding YMM register are not affected. 

VCVTSI2SD 

The extended fonn of the instruction has two 128-bit encodings: 

• When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source general- 
purpose register or a 32-bit memory location to a double-precision floating-point value and writes 
the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the 
first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM 
register that conesponds to the destination are cleared. 

• When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose 
register or a 64-bit memory location to a double-precision floating-point value and writes the 
converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the first 
source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

CVTSI2SD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTSI2SD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CVTSI2SD xmml, reg32/mem32 F2 (WO) OF 2A /r Converts a doubleword integer in reg32 or mem32 to a 

double-precision floating-point value in xmml. 

CVTSI2SD xmml, reg64lmem64 F2 (W1) OF 2A /r Converts a quadword integer in reg64 or mem64 to a 

double-precision floating-point value in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTSI2SD xmml, xmm2, reg32/mem32 

C4 

RXB.00001 

O.src.X.11 

2A /r 

VCVTSI2SD xmml, xmm2, reg64/mem64 

C4 

RXB.00001 

l.src.X.11 

2A /r 


Related Instructions 

(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTTPD2DQ, 
(V)CVTTSD2SI 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 






17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Precision, PE 

S 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTSI2SS Convert Signed Doubleword or Quadword Integer 
VCVTSI2SS to Scalar Single-Precision Floating-Point 

Converts a signed integer value to a single-precision floating-point value and writes the converted 

value to an XMM register. When the result of the conversion is an inexact value, the value is rounded 

as specified by MXCSR.RC. 

There are legacy and extended fonns of the instruction: 

CVTSI2SS 

The legacy fonn has two encodings: 

• When REX.W = 0, converts a signed doubleword integer value from a 32-bit source general- 
purpose register or a 32-bit memory location to a single-precision floating-point value and writes 
the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination 
XMM register and bits [255:128] of the corresponding YMM register are not affected. 

• When REX.W = 1, converts a a signed quadword integer value from a 64-bit source general- 
purpose register or a 64-bit memory location to a single-precision floating-point value and writes 
the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination 
XMM register and bits [255:128] of the corresponding YMM register are not affected. 

VCVTSI2SS 

The extended fonn of the instruction has two 128-bit encodings: 

• When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source general- 
purpose register or a 32-bit memory location to a single-precision floating-point value and writes 
the converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the 
first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM 
register that conesponds to the destination are cleared. 

• When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose 
register or a 64-bit memory location to a single-precision floating-point value and writes the 
converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the first 
source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

CVTSI2SS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCVTSI2SS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CVTSI2SS xmml, reg32lmem32 F3 (WO) OF 2A /r Converts a doubleword integer in reg32 or mem32 to a 

single-precision floating-point value in xmml. 

CVTSI2SS xmml, reg64lmem64 F3 (W1) OF 2A /r Converts a quadword integer in reg64 or mem64 to a 

single-precision floating-point value in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTSI2SS xmml, xmm2, reg32/mem32 

C4 

RXB.00001 

O.src.X.IO 

2A /r 

VCVTSI2SS xmml, xmm2, reg64lmem64 

C4 

RXB.00001 

l.src.X.10 

2A /r 


Related Instructions 

(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 






17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Precision, PE 

S 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


106 


CVTSI2SS, VCVTSI2SS 


Instruction Reference 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


CVTSS2SD Convert Scalar Single-Precision Floating-Point 
VCVTSS2SD to Scalar Double-Precision Floating-Point 

Converts a scalar single-precision floating-point value to a scalar double-precision floating-point 
value and writes the converted value to the low-order 64 bits of the destination. 

There are legacy and extended forms of the instruction: 

CVTSS2SD 

Converts a scalar single-precision floating-point value in the low-order 32 bits of a source XMM reg¬ 
ister or a 32-bit memory location to a scalar double-precision floating-point value and writes the con¬ 
verted value to the low-order 64 bits of a destination XMM register. Bits [127:64] of the destination 
and bits [255:128] of the corresponding YMM register are not affected. 

VCVTSS2SD 

The extended fonn of the instruction has a 128-bit encoding only. 

Converts a scalar single-precision floating-point value in the low-order 32 bits of the second source 
XMM register or 32-bit memory location to a scalar double-precision floating-point value and writes 
the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the des¬ 
tination are copied from the first source XMM register. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTSS2SD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTSS2SD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

CVTSS2SD xmml, xmm2/mem32 F3 OF 5A /r 


Mnemonic 

VCVTSS2SD xmml, xmm2, xmm3/mem64 


Description 

Converts a scalar single-precision floating-point value 
in xmm2 or mem32 to a scalar double-precision 
floating-point value in xmml. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.00001 X.sre.X.10 5A/r 


Related Instructions 

(V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSD2SS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

s 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

s 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTSS2SI Convert Scalar Single-Precision Floating-Point 

VCVTSS2SI to Signed Doubleword or Quadword Integer 

Converts a single-precision floating-point value to a signed integer value and writes the converted 
value to a general-purpose register. 

When the result of the conversion is an inexact value, the value is rounded as specified by 
MXCSR.RC. When the floating-point value is a NaN, infinity, or the result of the conversion is larger 
than the maximum signed doubleword (-2 31 to +2 31 - 1) or quadword value (-2 63 to +2 63 - 1), the 
indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) 
is returned when the invalid-operation exception (IE) is masked. 

There are legacy and extended fonns of the instruction: 

CVTSS2SI 

The legacy form has two encodings: 

• When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the 
converted value to a 32-bit general-purpose register. 

• When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the 
converted value to a 64-bit general-purpose register. 

VCVTSS2SI 

The extended form of the instruction has two 128-bit encodings: 

• When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the 
converted value to a 32-bit general-purpose register. 

• When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the 
converted value to a 64-bit general-purpose register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTSS2SI 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCVTSS2SI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 


Opcode 


Description 


CVTSS2SI reg32, xmm1/mem32 F3 (WO) OF 2D /r Converts a single-precision floating-point value in 

xmml or mem32 to a 32-bit integer value in reg32 

CVTSS2SI reg64, xmm1//mem64 F3 (W1) OF 2D /r Converts a single-precision floating-point value in 

xmml or mem64 to a 64-bit integer value in reg64 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTSS2SI reg32, xmm1/mem32 

C4 

RXB.00001 

0.1111.X.10 

2D /r 

VCVTSS2SI reg64, xmm1lmem64 

C4 

RXB.00001 

1.1111.X.10 

2D /r 


Related Instructions 

(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTTPS2DQ, (V)CVTTSS2SI 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTTPD2DQ Convert Packed Double-Precision Floating-Point 
VCVTTPD2DQ to Packed Doubleword Integer, Truncated 

Converts packed double-precision floating-point values to packed signed doubleword integer values 
and writes the converted values to the destination. 

When the result is an inexact value, it is truncated (rounded toward zero). When the floating-point 
value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword 
(-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the 
invalid-operation exception (IE) is masked. 


There are legacy and extended forms of the instruction: 

CVTTPD2DQ 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed signed doubleword integers and writes the converted values to the two low- 
order doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VCVTTPD2DQ 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two signed doubleword values and writes the converted values to the lower two double- 
word elements of the destination XMM register. Bits [255:128] of the YMM register that corresponds 
to the destination are cleared. 

YMM Encoding 

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory 
location to four signed doubleword integer values and writes the converted values to an XMM regis¬ 
ter. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTTPD2DQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTTPD2DQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

CVTTPD2DQ xmml, xmm2/mem128 66 OF E6 /r Converts two packed double-precision floating-point 

values in xmm2 or mem128 to packed doubleword 
integers in xmml. Truncates inexact result. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VCVTTPD2DQ xmml, xmm2/mem128 C4 RXB.00001 X.1111.0.01 E6/r 

VCVTTPD2DQ xmml, ymm2/mem256 C4 RXB.00001 X.1111.1.01 E6/r 

Related Instructions 

(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTSD2SI 

MXCSR Flags Affected 

MM I FZ I RC I PM I UM I OM I ZM I DM I IM DAZ I PE I UE I OE I ZE I DE I IE 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTTPS2DQ Convert Packed Single-Precision Floating-Point 
VCVTTPS2DQ to Packed Doubleword Integers, Truncated 

Converts packed single-precision floating-point values to packed signed doubleword integer values 
and writes the converted values to the destination. 

When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). 
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the max¬ 
imum signed doubleword (-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value 
(8000_0000h) when the invalid-operation exception (IE) is masked. 

There are legacy and extended forms of the instruction: 

CVTTPS2DQ 

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory 
location to four packed signed doubleword integer values and writes the converted values to an XMM 
register. The high-order 128-bits of the corresponding YMM register are not affected. 

VCVTTPS2DQ 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory 
location to four packed signed doubleword integer values and writes the converted values to an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory 
location to eight packed signed doubleword integer values and writes the converted values to a YMM 
register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTTPS2DQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTTPS2DQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

CVTTPS2DQ xmml, xmm2/mem128 F3 OF 5B /r Converts four packed single-precision floating-point 

values in xmm2 or mem128 to four packed 
doubleword integers in xmml. Truncates inexact 



result. 



Mnemonic 


Encoding 



VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

VCVTTPS2DQ xmml, xmm2/mem128 

C4 

RXB.00001 X.1111.0.10 

5B /r 

VCVTTPS2DQ ymml, ymm2/mem256 

C4 

RXB.00001 X.1111.1.10 

5B /r 
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Related Instructions 

(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTSS2SI 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTTSD2SI Convert Scalar Double-Precision Floating-Point 
VCVTTSD2SI to Signed Double- or Quadword Integer, Truncated 

Converts a scalar double-precision floating-point value to a signed integer value and writes the con¬ 
verted value to a general-purpose register. 

When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). 
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the max¬ 
imum signed doubleword (-2 31 to +2 31 - 1) or quadword value (-2 63 to +2 63 - 1), the instruction 
returns the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64- 
bit integers) when the invalid-operation exception (IE) is masked. 

There are legacy and extended forms of the instruction: 

CVTTSD2SI 

The legacy form of the instruction has two encodings: 

• When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted 
value to a 32-bit general purpose register. 

• When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the 
converted value to a 64-bit general purpose register. 

VCVTTSD2SI 

The extended form of the instruction has two 128-bit encodings. 

• When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted 
value to a 32-bit general purpose register. 

• When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits 
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the 
converted value to a 64-bit general purpose register. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTTSD2SI 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VCVTTSD2SI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

Opcode 

CVTTSD2SI reg32, xmm1/mem64 

F2 (WO) OF 2C It 

CVTTSD2SI reg64, xmm1/mem64 

F2 (W1)0F2C/r 

Mnemonic 



VCVTTSD2SI reg32, xmm2/mem64 
VCVTTSD2SI reg64, xmm2/mem64 


Description 

Converts a packed double-precision floating-point 
value in xmml or mem64 to a doubleword integer in 
reg32. Truncates inexact result. 

Converts a packed double-precision floating-point 
value in xmml or mem64 to a quadword integer in 
reg64 .Truncates inexact result. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

0.1111.X.11 

2C /r 

C4 

RXB.00001 

1.1111.X.11 

2C /r 


Related Instructions 

(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, 
(V)CVTTPD2DQ 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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CVTTSS2SI Convert Scalar Single-Precision Floating-Point 
VCVTTSS2SI to Signed Double or Quadword Integer, Truncated 

Converts a single-precision floating-point value to a signed integer value and writes the converted 
value to a general-purpose register. 

When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). 
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the max¬ 
imum signed doubleword (-2 31 to +2 31 - 1) or quadword value (-2 63 to +2 63 - 1), the indefinite inte¬ 
ger value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) is returned 
when the invalid-operation exception (IE) is masked. 

There are legacy and extended fonns of the instruction: 

CVTTSS2SI 

The legacy form of the instruction has two encodings: 

• When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the 
converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that 
corresponds to the source are not affected. 

• When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the 
converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that 
corresponds to the source are not affected. 

VCVTTSS2SI 

The extended fonn of the instruction has two 128-bit encodings: 

• When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the 
converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that 
corresponds to the source are cleared. 

• When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an 
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the 
converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that 
corresponds to the source are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

CVTTSS2SI 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VCVTTSS2SI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

CVTTSS2SI reg32, xmm1lmem32 


CVTTSS2SI reg64, xmm1lmem64 


Opcode Description 

F3 (WO) OF 2C /r Converts a single-precision floating-point value in 
xmml or mem32 to a 32-bit integer value in reg32. 
Truncates inexact result. 

F3 (W1) OF 2C /r Converts a single-precision floating-point value in 
xmml or mem64 to a 64-bit integer value in reg64. 
Truncates inexact result. 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VCVTTSS2SI reg32, xmm1/mem32 

C4 

RXB.00001 

0.1111.X.10 

2C /r 

VCVTTSS2SI reg64, xmm1/mem64 

C4 

RXB.00001 

1.1111.X.10 

2C /r 


Related Instructions 

(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Instruction Reference 


CVTTSS2SI, VCVTTSS2SI 


121 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DIVPD Divide 

VDIVPD Packed Double-Precision Floating-Point 

Divides each of the packed double-precision floating-point values of the first source operand by the 
corresponding packed double-precision floating-point values of the second source operand and writes 
the quotients to the destination. 

There are legacy and extended fonns of the instruction: 

DIVPD 

Divides two packed double-precision floating-point values in the first source XMM register by the 
corresponding packed double-precision floating-point values in either a second source XMM register 
or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VDIVPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Divides two packed double-precision floating-point values in the first source XMM register by the 
corresponding packed double-precision floating-point values in either a second source XMM register 
or a 128-bit memory location and writes the two results a destination XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Divides four packed double-precision floating-point values in the first source YMM register by the 
corresponding packed double-precision floating-point values in either a second source YMM register 
or a 256-bit memory location and writes the two results a destination YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

DIVPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VDIVPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

DIVPD xmml, xmm2/mem128 66 OF 5E /r Divides packed double-precision floating-point values in 

xmml by the packed double-precision floating-point 
values in xmm2 or mem128. Writes quotients to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VDIVPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

5E/r 

VDIVPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

5E/r 


Instruction Reference 


DIVPD, VDIVPD 


123 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Related Instructions 

(V)DIVPS, (V)DIVSD, (V)DIVSS 

MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 

M 

M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DIVPS Divide 

VDIVPS Packed Single-Precision Floating-Point 

Divides each of the packed single-precision floating-point values of the first source operand by the 
corresponding packed single-precision floating-point values of the second source operand and writes 
the quotients to the destination. 

There are legacy and extended fonns of the instruction: 

DIVPS 

Divides four packed single-precision floating-point values in the first source XMM register by the 
corresponding packed single-precision floating-point values in either a second source XMM register 
or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VDIVPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Divides four packed single-precision floating-point values in the first source XMM register by the 
corresponding packed single-precision floating-point values in either a second source XMM register 
or a 128-bit memory location and writes two results to a third destination XMM register. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Divides eight packed single-precision floating-point values in the first source YMM register by the 
corresponding packed single-precision floating-point values in either a second source YMM register 
or a 256-bit memory location and writes the two results a destination YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

DIVPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VDIVPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

DIVPS xmml, xmm2/mem128 OF 5E /r Divides packed single-precision floating-point values in 

xmml by the corresponding values in xmm2 or mem128. 
Writes quotients to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VDIVPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

5E/r 

VDIVPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

5E/r 
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Related Instructions 

(V)DIVPD, (V)DIVSD, (V)DIVSS 

MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 

M 

M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DIVSD Divide 

VDIVSD Scalar Double-Precision Floating-Point 

Divides the double-precision floating-point value in the low-order quadword of the first source oper¬ 
and by the double-precision floating-point value in the low-order quadword of the second source 
operand and writes the quotient to the low-order quadword of the destination. 

There are legacy and extended forms of the instruction: 

DIVSD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] 
of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the desti¬ 
nation are not affected. 

VDIVSD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. Bits [127:64] of the first source operand are copied to bits [127:64] of 
the destination. The destination is a third XMM register. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

DIVSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VDIVSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

DIVSD xmml, xmm2/mem64 F2 OF 5E /r Divides the double-precision floating-point value in the low- 

order 64 bits of xmml by the corresponding value in xmm2 
or mem64. Writes quotient to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VDIVSD xmml, xmm2, xmm3/mem64 

C4 

RXB.00001 

X.sre.X.11 

5E/r 


Related Instructions 

(V)DIVPD, (V)DIVPS, (V)DIVSS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 

M 

M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DIVSS Divide Scalar Single-Precision Floating-Point 

VDIVSS 

Divides the single-precision floating-point value in the low-order doubleword of the first source oper¬ 
and by the single-precision floating-point value in the low-order doubleword of the second source 
operand and writes the quotient to the low-order doubleword of the destination. 

There are legacy and extended forms of the instruction: 

DIVSS 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The first source register is also the destination register. Bits [127:32] 
of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the desti¬ 
nation are not affected. 

VDIVSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is a third XMM register. Bits [127:32] of the first 
source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

DIVSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VDIVSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

DIVSS xmml, xmm2lmem32 F3 OF 5E /r Divides a single-precision floating-point value in the low- 

order doubleword of xmml by a corresponding value in 
xmm2 or mem32. Writes the quotient to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VDIVSS xmml, xmm2, xmm3lmem32 

C4 

RXB.00001 

X.src.X.10 

5E/r 


Related Instructions 

(V)DIVPD, (V)DIVPS, (V)DIVSD 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 

M 

M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DPPD Dot Product 

VDPPD Packed Double-Precision Floating-Point 

Computes the dot-product of the input operands. An immediate operand specifies both the input val¬ 
ues and the destination locations to which the products are written. 

Selectively multiplies packed double-precision values in a source operand by the corresponding val¬ 
ues in a second source operand, writes the results to a temporary location, adds the results, writes the 
sum to a second temporary location and selectively writes the sum to a destination. 

Mask bits [5:4] of an 8-bit immediate operand perform multiplicative selection. Bit 5 selects bits 
[127:64] of the source operands; bit 4 selects bits [63:0] of the source operands. When a mask bit = 1, 
the corresponding packed double-precision floating point values are multiplied and the product is 
written to the corresponding position of a 128-bit temporary location. When a mask bit = 0, the corre¬ 
sponding position of the temporary location is cleared. 

After the two 64-bit values in the first temporary location are added and written to the 64-bit second 
temporary location, mask bits [1:0] of the same 8-bit immediate operand perform write selection. Bit 
1 selects bits [127:64] of the destination; bit 0 selects bits [63:0] of the destination. When a mask bit = 
1, the 64-bit value of the second temporary location is written to the corresponding position of the 
destination. When a mask bit = 0, the corresponding position of the destination is cleared. 

When the operation produces a NaN, its value is determined as follows. 


Source Operands (in either order) 

NaN Result 1 

QNaN 

Any non-NaN floating-point value 
(or single-operand instruction) 

Value of QNaN 

SNaN 

Any non-NaN floating-point value 
(or single-operand instruction) 

Value of SNaN, 
converted to a QNaN 2 

QNaN 

QNaN 

First operand 

QNaN 

SNaN 

First operand 

(converted to QNaN if SNaN 

SNaN 

SNaN 

First operand 
converted to a QNaN 2 

Note: 1. A NaN result produced when the floating-point invalid-operation exception is masked. 

2. The conversion is done by changing the most-significant fraction bit to 1. 


For each addition occurring in either the second or third step, for the purpose of NaN propagation, the 
addend of lower bit index is considered to be the first of the two operands. For example, when both 
multiplications produce NaNs, the one that corresponds to bits [64:0] is written to all indicated fields 
of the destination, regardless of how those NaNs were generated from the sources. When the high- 
order multiplication produces NaNs and the low-order multiplication produces infinities of opposite 
signs, the real indefinite QNaN (produced as the sum of the infinities) is written to the destination. 

NaNs in source operands or in computational results result in at least one NaN in the destination. For 
the 256-bit version, NaNs are propagated within the two independent dot product operations only to 
their respective 128-bit results. 


Instruction Reference 


DPPD, VDPPD 


131 




AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


There are legacy and extended fonns of the instruction: 

DPPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VDPPD 

The extended fonn of the instruction has a single 128-bit encoding. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

DPPD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VDPPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 


Opcode 


Description 


DPPD xmml, xmm2/mem128, imm8 66 OF 3A 41 

Mnemonic 

VDPPD xmml, xmm2, xmm3/mem128, imm8 


/r ib Selectively multiplies packed double-precision 
floating-point values in xmm2 or mem128 by 
corresponding values in xmml, adds interim 
products, selectively writes results to xmml. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.00011 X.sre.0.01 41/rib 


Related Instructions 

(V)DPPS 

MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: 

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
Exceptions are determined separately for each add-multiply operation. 

Unmasked exceptions do not affect the destination 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

s 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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DPPS Dot Product 

VDPPS Packed Single-Precision Floating-Point 

Computes the dot-product of the input operands. An immediate operand specifies both the input val¬ 
ues and the destination locations to which the products are written. 

Selectively multiplies packed single-precision values in a source operand by corresponding values in 
a second source operand, writes results to a temporary location, adds pairs of results, writes the sums 
to additional temporary locations, and selectively writes a cumulative sum to a destination. 

Mask bits [7:4] of an 8-bit immediate operand perform multiplicative selection. Each bit selects a 32- 
bit segment of the source operands; bit 7 selects bits [127:96], bit 6 selects bits [95:64], bit 5 selects 
bits [63:32], and bit 4 selects bits [31:0]. When a mask bit = 1, the corresponding packed single-preci¬ 
sion floating point values are multiplied and the product is written to the corresponding position of a 
128-bit temporary location. When a mask bit = 0, the corresponding position of the temporary loca¬ 
tion is cleared. 

After multiplication, three pairs of 32-bit values are added and written to temporary locations. 

Bits [63:32] and [31:0] of temporary location 1 are added and written to 32-bit temporary location 2; 
bits [127:96] and [95:64] of temporary location 1 are added and written to 32-bit temporary location 
3; then the contents of temporary locations 2 and 3 are added and written to 32-bit temporary location 
4. 

After addition, mask bits [3:0] of the same 8-bit immediate operand perform write selection. Each bit 
selects a 32-bit segment of the source operands; bit 3 selects bits [127:96], bit 2 selects bits [95:64], 
bit 1 selects bits [63:32], and bit 0 selects bits [31:0] of the destination. When a mask bit = 1, the 64- 
bit value of the fourth temporary location is written to the corresponding position of the destination. 
When a mask bit = 0, the corresponding position of the destination is cleared. 


For the 256-bit extended encoding, this process is perfonned on the upper and lower 128 bits of the 
affected YMM registers. 

When the operation produces a NaN, its value is determined as follows. 


Source Operands (in either order) 

NaN Result 1 

QNaN 

Any non-NaN floating-point value 
(or single-operand instruction) 

Value of QNaN 

SNaN 

Any non-NaN floating-point value 
(or single-operand instruction) 

Value of SNaN, 
converted to a QNaN 2 

QNaN 

QNaN 

First operand 

QNaN 

SNaN 

First operand 

(converted to QNaN if SNaN 

SNaN 

SNaN 

First operand 
converted to a QNaN 2 

Note: 1. A NaN result produced when the floating-point invalid-operation exception is masked. 

2. The conversion is done by changing the most-significant fraction bit to 1. 


For each addition occurring in either the second or third step, for the purpose of NaN propagation, the 
addend of lower bit index is considered to be the first of the two operands. For example, when all four 
multiplications produce NaNs, the one that corresponds to bits [31:0] is written to all indicated fields 
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of the destination, regardless of how those NaNs were generated from the sources. When the two 
highest-order multiplication produce NaNs and the two lowest-low-order multiplications produce 
infinities of opposite signs, the real indefinite QNaN (produced as the sum of the infinities) is written 
to the destination. 

NaNs in source operands or in computational results result in at least one NaN in the destination. For 
the 256-bit version, NaNs are propagated within the two independent dot product operations only to 
their respective 128-bit results. 

There are legacy and extended fonns of the instruction: 

DPPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VDPPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

DPPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VDPPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

DPPS xmml, xmm2/mem128, imm8 66 OF 3A 40 /r ib Selectively multiplies packed single-precision 


Mnemonic 

VDPPS xmml, xmm2, xmm3/mem128, imm8 
VDPPS ymml, ymm2, ymm3/mem256, imm8 

Related Instructions 

(V)DPPD 


floating-point values in xmm2 or mem128 by 
corresponding values in xmml , adds interim 
products, selectively writes results to xmml. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00011 

X.src.0.01 

40 /r ib 

C4 

RXB.00011 

X.src.1.01 

40 /r ib 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: 

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
Exceptions are determined separately for each add-multiply operation. 

Unmasked exceptions do not affect the destination 




Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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EXTRACTPS Extract 

VEXTRACTPS Packed Single-Precision Floating-Point 

Copies one of four packed single-precision floating-point values from a source XMM register to a 
general purpose register or a 32-bit memory location. 

Bits [1:0] of an immediate byte operand specify the location of the 32-bit value that is copied. 00b 
corresponds to the low word of the source register and lib corresponds to the high word of the source 
register. Bits [7:2] of the immediate operand are ignored. 


There are legacy and extended fonns of the instruction: 

EXTRACTPS 

The source operand is an XMM register. The destination can be a general purpose register or a 32-bit 
memory location. A 32-bit single-precision value extracted to a general purpose register is zero- 
extended to 64-bits. 

VEXTRACTPS 

The extended fonn of the instruction has a single 128-bit encoding. 

The source operand is an XMM register. The destination can be a general purpose register or a 32-bit 
memory location. 

Instruction Support 


Form 

Subset 

Feature Flag 

EXTRACTPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VEXTRACTPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 


Opcode 


Description 


EXTRACTPS reg32/mem32, xmml 
imm8 

Mnemonic 


66 OF 3A 17 /r ib Extract the single-precision floating-point 

element of xmml specified by imm8 to 
reg32/mem32. 

Encoding 


VEXTRACTPS reg32/mem32, xmml, imm8 


VEX RXB.map_select W.vvvv.L.pp 

C4 RXB.00011 X.1111.0.01 


Opcode 

17/r ib 


Related Instructions 

(V)INSERTPS 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

s 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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EXTRQ Extract Field From Register 

Extracts specified bits from the lower 64 bits of the first operand (the destination XMM register). The 
extracted bits are saved in the least-significant bit positions of the lower quadword of the destination; 
the remaining bits in the lower quadword of the destination register are cleared to 0. The upper quad- 
word of the destination register is undefined. 

The portion of the source data being extracted is defined by the bit index and the field length. The bit 
index defines the least-significant bit of the source operand being extracted. Bits [bit index + length 
field - 1 ]:[bit index] are extracted. If the sum of the bit index + length field is greater than 64, the 
results are undefined. 

For example, if the bit index is 32 (20h) and the field length is 16 (lOh), then the result in the destina¬ 
tion register will be source [47:32] in bits 15:0, with zeros in bits 63:16. 

A value of zero in the field length is defined as a length of 64. If the length field is 0 and the 

bit index is 0, bits 63:0 of the source are extracted. For any other value of the bit index, the results are 

undefined. 

The bit index and field length can be specified as immediate values (second and first immediate oper¬ 
ands, respectively, in the case of the three argument version of the instruction), or they can both be 
specified by fields in an XMM source operand. In the latter case, bits [5:0] of the XMM register spec¬ 
ify the number of bits to extract (the field length ) and bits [13:8] of the XMM register specify the 
index of the first bit in the field to extract. The bit index and field length are each six bits in length; 
other bits of the field are ignored. 

The diagram below illustrates the operation of this instruction. 


XMM1 



XMM1 


XMM2 


127 6463 


1 


0 127 


138 5 0 


shift right 


mask to field length^- 
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Instruction Support 


Form 

Subset 

Feature Flag 

EXTRQ 

SSE4A 

CPUID Fn8000_0001_ECX[SSE4A] (bit 6) 


Software must check the CPUID bit once per program or library initialization before using the 
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain 
processor feature support information, see Appendix E of Volume 3. 


Instruction Encoding 


Mnemonic 

Opcode 

Description 

EXTRQ xmml, imm8, imm8 

66 OF 78 10 ib ib 

Extract field from xmml , with the least significant bit 
of the extracted data starting at the bit index 
specified by [5:0] of the second immediate byte, with 
the length specified by [5:0] of the first immediate 
byte. 

EXTRQ xmml , xmm2 

66 OF 79 It 

Extract field from xmml , with the least significant bit 
of the extracted data starting at the bit index 
specified by xmm2[13:8], with the length specified 
by xmm2[5:0]. 


Related Instructions 

INSERTQ, PINSRW, PEXTRW 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

SSE4A instructions are not supported, as indicated by 
CPUID Fn8000_0001_ECX[SSE4A] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 is cleared to 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 
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HADDPD Horizontal Add 

VHADDPD Packed Double-Precision Floating-Point 

Adds adjacent pairs of double-precision floating-point values in two source operands and writes the 
sums to a destination. 

There are legacy and extended forms of the instruction: 

HADDPD 

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM reg¬ 
ister and writes the sum to bits [63:0] of the destination; adds the corresponding doublewords of the 
second source XMM register or a 128-bit memory location and writes the sum to bits [127:64] of the 
destination. The first source register is also the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are not affected. 

VHADDPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM reg¬ 
ister and writes the sum to bits [63:0] of the destination XMM register; adds the corresponding dou¬ 
blewords of the second source XMM register or a 128-bit memory location and writes the sum to bits 
[127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination 
are cleared. 

YMM Encoding 

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the of the first source 
YMM register and writes the sum to bits [63:0] of the destination YMM register; adds the corre¬ 
sponding doublewords of the second source YMM register or a 256-bit memory location and writes 
the sum to bits [127:64] of the destination. Performs the same process for the upper 128 bits of the 
sources and destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

HADDPD 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VHADDPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

HADDPD xmml, xmm2/mem128 66 OF 7C /r Adds adjacent pairs of double-precision values in xmml 

and xmm2 or mem128. Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VHADDPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

7C /r 

VHADDPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

7C /r 
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Related Instructions 

(V)HADDPS, (V)HSUBPD, (V)HSUBPS 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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HADDPS Horizontal Add 

VHADDPS Packed Single-Precision 

Adds adjacent pairs of single-precision floating-point values in two source operands and writes the 
sums to a destination. 

There are legacy and extended forms of the instruction: 

HADDPS 

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM regis¬ 
ter and writes the sum to bits [31:0] of the destination; adds the packed single-precision values in bits 
[127:96] and bits [95:64] of the first source register and writes the sum to bits [63:32] of the destina¬ 
tion. Adds the corresponding values in the second source XMM register or a 128-bit memory location 
and writes the sum to bits [95:64] and [127:96] of the destination. The first source register is also the 
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VHADDPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM regis¬ 
ter and writes the sum to bits [31:0] of the destination XMM register; adds the packed single-preci¬ 
sion values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits 
[63:32] of the destination. Adds the corresponding values in the second source XMM register or a 
128-bit memory location and writes the sum to bits [95:64] and [127:96] of the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source YMM regis¬ 
ter and writes the sum to bits [31:0] of the destination YMM register; adds the packed single-preci¬ 
sion values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits 
[63:32] of the destination. Adds the corresponding values in the second source YMM register or a 
256-bit memory location and writes the sums to bits [95:64] and [127:96] of the destination. Performs 
the same process for the upper 128 bits of the sources and destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

HADDPS 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VHADDPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

HAD DPS xmml, xmm2/mem128 F2 OF 7C/r 

Mnemonic 

VHADDPS xmml, xmm2, xmm3/mem128 
VHADDPS ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)HADDPD, (V)HSUBPD, (V)HSUBPS 

MXCSR Flags Affected 


Description 

Adds adjacent pairs of single-precision values in xmml 
and xmm2 or mem128. Writes the sums to xmml. 


Encoding 


VEX 

RXB.map select 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.11 

7C /r 

C4 

RXB.00001 

X.src.1.11 

7C /r 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

HvT 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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HSUBPD Horizontal Subtract 

VHSUBPD Packed Double-Precision 

Subtracts adjacent pairs of double-precision floating-point values in two source operands and writes 
the sums to a destination. 

There are legacy and extended forms of the instruction: 

HSUBPD 

The first source register is also the destination. 

Subtracts the packed double-precision value in bits [127:64] from the value in bits [63:0] of the first 
source XMM register and writes the difference to bits [63:0] of the destination; subtracts the corre¬ 
sponding values of the second source XMM register or a 128-bit memory location and writes the dif¬ 
ference to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the 
destination are not affected. 

VHSUBPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the first 
source XMM register and writes the difference to bits [63:0] of the destination XMM register; sub¬ 
tracts the corresponding values of the second source XMM register or a 128-bit memory location and 
writes the difference to bits [127:64] of the destination. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are cleared. 

YMM Encoding 

Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the of 
the first source YMM register and writes the difference to bits [63:0] of the destination YMM regis¬ 
ter; subtracts the corresponding values of the second source YMM register or a 256-bit memory loca¬ 
tion and writes the difference to bits [127:64] of the destination. Performs the same process for the 
upper 128 bits of the sources and destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

HSUBPD 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VHSUBPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

HSUBPD xmml, xmm2/mem128 66 OF 7D/r 

Mnemonic 

VHSUBPD xmml, xmm2, xmm3/mem128 
VHSUBPD ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)HSUBPS, (V)HADDPD, (V)HADDPS 

MXCSR Flags Affected 


Description 

Subtracts adjacent pairs of double-precision floating¬ 
point values in xmml and xmm2 or mem128. Writes the 
differences to xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.01 

7D /r 

C4 

RXB.00001 

X.src.1.01 

7D/r 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

HvT 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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HSUBPS Horizontal Subtract Packed Single 

VHSUBPS 

Subtracts adjacent pairs of single-precision floating-point values in two source operands and writes 
the differences to a destination. 

There are legacy and extended forms of the instruction: 

HSUBPS 

Subtracts the packed single-precision values in bits [63:32] from the values in bits [31:0] of the first 
source XMM register and writes the difference to bits [31:0] of the destination; subtracts the packed 
single-precision values in bits [127:96] from the value in bits [95:64] of the first source register and 
writes the difference to bits [63:32] of the destination. Subtracts the corresponding values of the sec¬ 
ond source XMM register or a 128-bit memory location and writes the differences to bits [95:64] and 
[127:96] of the destination. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VHSUBPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first 
source XMM register and writes the difference to bits [31:0] of the destination XMM register; sub¬ 
tracts the packed single-precision values in bits [127:96] from the value bits [95:64] of the first source 
register and writes the sum to bits [63:32] of the destination. Subtracts the corresponding values of the 
second source XMM register or a 128-bit memory location and writes the differences to bits [95:64] 
and [127:96] of the destination. Bits [255:128] of the YMM register that corresponds to the destina¬ 
tion are cleared. 

YMM Encoding 

Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first 
source YMM register and writes the difference to bits [31:0] of the destination YMM register; sub¬ 
tracts the packed single-precision values in bits [127:96] from the value in bits [95:64] of the first 
source register and writes the difference to bits [63:32] of the destination. Subtracts the corresponding 
values of the second source YMM register or a 256-bit memory location and writes the differences to 
bits [95:64] and [127:96] of the destination. Performs the same process for the upper 128 bits of the 
sources and destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

HSUBPS 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VHSUBPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

HSUBPS xmml, xmm2/mem128 F2 0F7D/r 

Mnemonic 

VHSUBPS xmml, xmm2, xmm3/mem128 
VHSUBPS ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)HSUBPD, (V)HADDPD, (V)HADDPS 

MXCSR Flags Affected 


Description 

Subtracts adjacent pairs of values in xmml and xmm2 
or mem128. Writes differences to xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.11 

7D /r 

C4 

RXB.00001 

X.src.1.11 

7D /r 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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INSERTPS Insert 

VINSERTPS Packed Single-Precision Floating-Point 

Copies a selected single-precision floating-point value from a source operand to a selected location in 
a destination register and optionally clears selected elements of the destination. The legacy and 
extended forms of the instruction treat the remaining elements of the destination in different ways. 

Selections are specified by three fields of an immediate 8-bit operand: 


7 

6 

5 

4 

3 

2 

1 

0 

COUNTS 

COUNTD 

ZMASK 


COUNT_S — The binary value of the field specifies a 32-bit element of a source register, counting 
upward from the low-order doubleword. COUNT S is used only for register source; when the source 
is a memory operand, COUNT S = 0. 

COUNTD — The binary value of the field specifies a 32-bit destination element, counting upward 
from the low-order doubleword. 

ZMASK — Set a bit to clear a 32-bit element of the destination. 

There are legacy and extended fonns of the instruction: 

INSERTPS 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

When the source operand is a register, the instruction copies the 32-bit element of the source specified 
by Count s to the location in the destination specified by CountD, and clears destination elements 
as specified by ZMask. Elements of the destination that are not cleared are not affected. 

When the source operand is a memory location, the instruction copies a 32-bit value from memory, to 
the location in the destination specified by Count D, and clears destination elements as specified by 
ZMask. Elements of the destination that are not cleared are not affected. 

VINSERTPS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

When the second source operand is a register, the instruction copies the 32-bit element of the source 
specified by Count s to the location in the destination specified by Count D. The other elements of 
the destination are either copied from the first source operand or cleared as specified by ZMask. 

When the second source operand is a memory location, the instruction copies a 32-bit value from the 
source to the location in the destination specified by Count D. The other elements of the destination 
are either copied from the first source operand or cleared as specified by ZMask. 

Instruction Support 


Form 

Subset 

Feature Flag 

INSERTPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VINSERTPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

INSERTPS xmml , xmm2/mem32, imm8 66 OF 3A 21 /r ib Insert a selected single-precision floating¬ 
point value from xmm2 or from mem32 at a 
selected location in xmml and clear 
selected elements of xmml. Selections 
specified by imm8. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VINSERTPS xmml, xmm2, xmm3/mem128, imm8 C4 RXB.00011 X.src.0.01 21/r ib 

Related Instructions 

(V)EXTRACTPS 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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INSERTQ Insert Field 

Inserts bits from the lower 64 bits of the source operand into the lower 64 bits of the destination oper¬ 
and. No other bits in the lower 64 bits of the destination are modified. The upper 64 bits of the desti¬ 
nation are undefined. 

The least-significant / bits of the source operand are inserted into the destination, with the least-signif¬ 
icant bit of the source operand inserted at bit position n, where / and n are defined as the field length 
and bit index, respectively. 

Bits {field length - 1):0 of the source operand are inserted into bits {bit index + field length - 1 ):{bit 
index) of the destination. If the sum of the bit index + length field is greater than 64, the results are 
undefined. 

For example, if the bit index is 32 (20h) and the field length is 16 (lOh), then the result in the destina¬ 
tion register will be source operand)] 5:0] in bits 47:32. Bits 63:48 and bits 31:0 are not modified. 

A value of zero in the field length is defined as a length of 64. If the length field is 0 and the bit index 
is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are 
undefined. 

The bits to insert are located in the XMM2 source operand. The bit index and field length can be spec¬ 
ified as immediate values or can be specified in the XMM source operand. In the immediate form, the 
bit index and the field length are specified by the fourth (second immediate byte) and third operands 
(first immediate byte), respectively. In the register form, the bit index and field length are specified in 
bits [77:72] and bits [69:64] of the source XMM register, respectively. The bit index and field length 
are each six bits in length; other bits in the field are ignored. 

The diagram below illustrates the operation of this instruction. 


first second 
XMM2 imm8 imm8 

127 6463 0 7 5 0 7 5 0 



XMM1 


XMM2 
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Instruction Support 


Form 

Subset 

Feature Flag 

INSERTQ 

SSE4A 

CPUID Fn8000_0001_ECX[SSE4A] (bit 6) 


Software must check the CPUID bit once per program or library initialization before using the 
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain 
processor feature support information, see Appendix E of Volume 3. 


Instruction Encoding 


Mnemonic 


Opcode Description 


Insert field starting at bit 0 of xmm2 with the length 

INSERTQ xmml , xmm2, imm8, F2 OF 78/r ib ib specified by [5:0] of the first immediate byte. This 
imm8 field is inserted into xmml starting at the bit position 

specified by [5:0] of the second immediate byte. 


INSERTQ xmml, xmm2 F2 OF 79/r 


Insert field starting at bit 0 of xmm2 with the length 
specified by xmm2[69:64]. This field is inserted into 
xmml starting at the bit position specified by 
xmm2[77:72]. 


Related Instructions 

EXTRQ, PINSRW, PEXTRW 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

SSE4A instructions are not supported, as indicated by 
CPUID Fn8000_0001_ECX[SSE4A] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 is cleared to 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 
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LDDQU Load 

VLDDQU Unaligned Double Quadword 

Loads unaligned double quadwords from a memory location to a destination register. 

Like the (V)MOVUPD instructions, (V)LDDQU loads a 128-bit or 256-bit operand from an 
unaligned memory location. However, to improve performance when the memory operand is actually 
misaligned, (V)LDDQU may read an aligned 16 or 32 bytes to get the first part of the operand, and an 
aligned 16 or 32 bytes to get the second part of the operand. This behavior is implementation-specific, 
and (V)LDDQU may only read the exact 16 or 32 bytes needed for the memory operand. If the mem¬ 
ory operand is in a memory range where reading extra bytes can cause performance or functional 
issues, use (V)MOVUPD instead of (V)LDDQU. 

Memory operands that are not aligned on 16-byte or 32-byte boundaries do not cause general-protec¬ 
tion exceptions. 

There are legacy and extended fonns of the instruction: 

LDDQU 

The source operand is an unaligned 128-bit memory location. The destination operand is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination register are not 
affected. 

VLDDQU 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The source operand is an unaligned 128-bit memory location. The destination operand is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination register are cleared. 

YMM Encoding 

The source operand is an unaligned 256-bit memory location. The destination operand is a YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

LDDQU 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VLDDQU 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

LDDQU xmml, mem128 

Mnemonic 

VLDDQU xmml, mem128 
VLDDQU ymml, mem256 


Opcode 

F2 OF FO/r 


Description 

Loads a 128-bit value from an unaligned mem128 to 
xmml. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.11 

FO/r 

C4 

RXB.00001 

X.1111.1.11 

FO/r 
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Related Instructions 

(V)MOVDQU 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

X 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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LDMXCSR Load 

VLDMXCSR MXCSR Control/Status Register 

Loads the MXCSR register with a 32-bit value from memory. 

For both legacy LDMXCSR and extended VLDMXCSR forms of the instruction, the source operand 
is a 32-bit memory location and the destination operand is the MXCSR. 

If an MXCSR load clears a SIMD floating-point exception mask bit and sets the corresponding 
exception flag bit, a SIMD floating-point exception is not generated immediately. An exception is 
generated only when the next instruction that operates on an XMM or YMM register operand and 
causes that particular SIMD floating-point exception to be reported executes. 

A general protection exception occurs if the instruction attempts to load non-zero values into reserved 
MXCSR bits. Software can use MXCSR_MASK to determine which bits are reserved. For details, 
see “128-Bit, 64-Bit, and x87 Programming” in Volume 2. 

The MXCSR register is described in “Registers” in Volume 1. 

instruction Support 


Form 

Subset 

Feature Flag 

LDMXCSR 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VLDMXCSR 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

LDMXCSR mem32 OF AE /2 Loads MXCSR register with 32-bit value from memory. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VLDMXCSR mem32 

C4 

RXB.00001 

X.1111.0.00 

AE 12 


Related Instructions 

(V)STMXCSR 

MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

S 

S 

S 

CRO.EM = 1. 

S 

S 

S 

CR4.0SFXSR = 0. 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Null data segment used to reference memory. 

S 

S 

X 

Attempt to load non-zero values into reserved MXCSR bits 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MASKMOVDQU Masked Move 

VMASKMOVDQU Double Quadword Unaligned 

Moves bytes from the first source operand to a memory location specified by the DS:rDI register. 
Bytes are selected by mask bits in the second source operand. The memory location may be 
unaligned. 

The mask consists of the most significant bit of each byte of the second source register. 

When a mask bit = 1, the corresponding byte of the first source register is written to the destination; 
when a mask bit = 0, the corresponding byte is not written. 

Exception and trap behavior for elements not selected for storage to memory is implementation 
dependent. For instance, a given implementation may signal a data breakpoint or a page fault for 
bytes that are zero-masked and not actually written. 

The instruction implicitly uses weakly-ordered, write-combining buffering for the data, as described 
in “Buffering and Combining Memory Writes” in Volume 2. For data that is shared by multiple pro¬ 
cessors, this instruction should be used together with a fence instruction in order to ensure data coher¬ 
ency (see “Cache and TFB Management” in Volume 2). 

There are legacy and extended forms of the instruction: 

MASKMOVDQU 

The first source operand is an XMM register and the second source operand is an XMM register. The 
destination is a 128-bit memory location. 

VMASKMOVDQU 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is an XMM register. The 
destination is a 128-bit memory location. 

Instruction Support 


Form 

Subset 

Feature Flag 

MASKMOVDQU 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMASKMOVDQU 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MASKMOVDQU xmml, xmm2 66 OF F7 /r Move bytes selected by a mask value in xmm2 from 

xmml to the memory location specified by DS:rDI. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMASKMOVDQU xmml, xmm2 

C4 

RXB.00001 

X.1111.0.01 

F7 /r 


Related Instructions 

(V)MASKMOVPD, (V)MASKMOVPS 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MAXPD Maximum 

VMAXPD Packed Double-Precision Floating-Point 

Compares each packed double-precision floating-point value of the first source operand to the corre¬ 
sponding value of the second source operand and writes the numerically greater value into the corre¬ 
sponding location of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MAXPD 

Compares two pairs of packed double-precision floating-point values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VMAXPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Compares two pairs of packed double-precision floating-point values. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

Compares four pairs of packed double-precision floating-point values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MAXPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMAXPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MAXPD xmml, xmm2/mem128 66 OF 5F /r Compares two pairs of packed double-precision values in 

xmml and xmm2 or mem128 and writes the greater value 
to the corresponding position in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMAXPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

5F/r 

VMAXPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

5F/r 


Related Instructions 

(V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 

MXCSR Flags Affected 

MM I FZ I RC I PM I UM I OM I ZM I DM I IM DAZ I PE I UE I OE I ZE I DE I IE 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


164 


MAXPD, VMAXPD 


Instruction Reference 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


MAXPS Maximum 

VMAXPS Packed Single-Precision Floating-Point 

Compares each packed single-precision floating-point value of the first source operand to the corre¬ 
sponding value of the second source operand and writes the numerically greater value into the corre¬ 
sponding location of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MAXPS 

Compares four pairs of packed single-precision floating-point values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VMAXPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Compares four pairs of packed single-precision floating-point values. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

Compares eight pairs of packed single-precision floating-point values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MAXPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMAXPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MAXPS xmml, xmm2/mem128 OF 5F /r Compares four pairs of packed single-precision values in 

xmml and xmm2 or mem128 and writes the greater 
values to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMAXPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

5F/r 

VMAXPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

5F/r 


Related Instructions 

(V)MAXPD, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MAXSD Maximum 

VMAXSD Scalar Double-Precision Floating-Point 

Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source 
operand to a corresponding value in the second source operand and writes the numerically greater 
value into the low-order 64 bits of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MAXSD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. The first source register is also the destination. When the second source is 
a 64-bit memory location, the upper 64 bits of the first source register are copied to the destination. 
Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds 
to the destination are not affected. 

VMAXSD 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is an XMM register. When the second source is a 64- 
bit memory location, the upper 64 bits of the first source register are copied to the destination. Bits 
[127:64] of the destination are copied from bits [127:64] of the first source. Bits [255:128] of the 
YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MAXSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMAXSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MAXSD xmml, xmm2lmem64 F2 OF 5F /r Compares a pair of scalar double-precision values in the 

low-order 64 bits of xmml and xmm2 or mem64 and 
writes the greater value to the low-order 64 bits of xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VMAXSD xmml, xmm2, xmm3lmem64 C4 RXB.00001 X.src.X.11 5F/r 

Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MAXSD, VMAXSD 
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MAXSS Maximum 

VMAXSS Scalar Single-Precision Floating-Point 

Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source 
operand to a corresponding value in the second source operand and writes the numerically greater 
value into the low-order 32 bits of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MAXSS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destina¬ 
tion are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VMAXSS 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination 
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MAXSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMAXSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MAXSS xmml, xmm2lmem32 F3 OF 5F /r Compares a pair of scalar single-precision values in the 

low-order 32 bits of xmml and xmm2 or mem32 and 
writes the greater value to the low-order 32 bits of xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VMAXSS xmml, xmm2, xmm3lmem32 C4 RXB.00001 X.src.X. 10 5F/r 

Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MINPD Minimum 

VMINPD Packed Double-Precision Floating-Point 

Compares each packed double-precision floating-point value of the first source operand to the corre¬ 
sponding value of the second source operand and writes the numerically lesser value into the corre¬ 
sponding location of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MINPD 

Compares two pairs of packed double-precision floating-point values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VMINPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Compares two pairs of packed double-precision floating-point values. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

Compares four pairs of packed double-precision floating-point values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MINPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMINPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MINPD xmml, xmm2/mem128 66 OF 5D /r Compares two pairs of packed double-precision values in 

xmml and xmm2 or mem128 and writes the lesser value 
to the corresponding position in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMINPD xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.01 

5D /r 

VMINPD ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.01 

5D/r 


Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPS, (V)MINSD, (V)MINSS 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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VMINPS Packed Single-Precision Floating-Point 

Compares each packed single-precision floating-point value of the first source operand to the corre¬ 
sponding value of the second source operand and writes the numerically lesser value into the corre¬ 
sponding location of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MINPS 

Compares four pairs of packed single-precision floating-point values. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VMINPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Compares four pairs of packed single-precision floating-point values. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

Compares eight pairs of packed single-precision floating-point values. 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MINPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMINPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MINPS xmml, xmm2/mem128 OF 5D /r Compares four pairs of packed single-precision values in 

xmml and xmm2 or mem128 and writes the lesser values 
to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMINPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

5D /r 

VMINPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

5D /r 


Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINSD, (V)MINSS 

MXCSR Flags Affected 

MM I FZ I RC I PM I UM I OM I ZM I DM I IM DAZ I PE I UE I OE I ZE I DE I IE 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MINSD Minimum 

VMINSD Scalar Double-Precision Floating-Point 

Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source 
operand to a corresponding value in the second source operand and writes the numerically lesser 
value into the low-order 64 bits of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MINSD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destina¬ 
tion are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VMINSD 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the destination 
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MINSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMINSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MINSD xmml, xmm2/mem64 F2 OF 5D /r Compares a pair of scalar double-precision values in the 

low-order 64 bits of xmml and xmm2 or mem64 and 
writes the lesser value to the low-order 64 bits of xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VMINSD xmml, xmm2, xmm3lmem64 C4 RXB.00001 X.src.X.11 5D/r 

Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MINSD, VMINSD 


179 








AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


MINSS Minimum 

VMINSS Scalar Single-Precision Floating-Point 

Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source 
operand to a corresponding value in the second source operand and writes the numerically lesser 
value into the low-order 32 bits of the destination. 

If both source operands are equal to zero, the value of the second source operand is returned. If either 
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source 
operand is written to the destination. 

There are legacy and extended fonns of the instruction: 

MINSS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destina¬ 
tion are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VMINSS 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination 
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MINSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMINSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MINSS xmml, xmm2/mem32 F3 OF 5D /r Compares a pair of scalar single-precision values in the 

low-order 32 bits of xmml and xmm2 or mem32 and 
writes the lesser value to the low-order 32 bits of xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VMINSS xmml, xmm2, xmm3/mem32 C4 RXB.00001 X.src.X.10 5D/r 

Related Instructions 

(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 
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MOVAPD Move Aligned 

VMOVAPD Packed Double-Precision Floating-Point 

Moves packed double-precision floating-point values. Values can be moved from a register or mem¬ 
ory location to a register; or from a register to a register or memory location. 

A memory operand that is not aligned causes a general-protection exception. 

There are legacy and extended forms of the instruction: 

MOVAPD 

Moves two double-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVAPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves two double-precision floating-point values. There are encodings for each type of move: 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves four double-precision floating-point values. There are encodings for each type of move: 

• The source operand is either a YMM register or a 256-bit memory location. The destination 
operand is a YMM register. 

• The source operand is a YMM register. The destination operand is either a YMM register or a 
256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVAPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVAPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 


MOVAPD xmml, xmm2/mem128 

66 OF 28 /r Moves two packed double-precision floating-point 

values from xmm2 or mem128 to xmml. 


MOVAPD xmm1/mem128, xmm2 

66 OF 29 /r Moves two packed double-precision floating-point 

values from xmml or mem128 to xmm2. 


Mnemonic 

Encoding 




VEX RXB.map_select 

W.vvvv.L.pp 

Opcode 

VMOVAPD xmml, xmm2/mem128 

04 RXB.00001 

X.1111.0.01 

28 

/ r 

VMOVAPD xmm1/mem128, xmm2 

04 RXB.00001 

X.1111.0.01 

29 

/r 

VMOVAPD ymml, ymm2/mem256 

04 RXB.00001 

X.1111.1.01 

28 

/r 

VMOVAPD ymm1/mem256, ymm2 

C4 RXB.00001 

X.1111.1.01 

29 

/r 


Related Instructions 

(V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVAPS Move Aligned 

VMOVAPS Packed Single-Precision Floating-Point 

Moves packed single-precision floating-point values. Values can be moved from a register or memory 
location to a register; or from a register to a register or memory location. 

A memory operand that is not aligned causes a general-protection exception. 

There are legacy and extended forms of the instruction: 

MOVAPS 

Moves four single-precision floating-point values. 

There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVAPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves four single-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves eight single-precision floating-point values. There are encodings for each type of move. 

• The source operand is either a YMM register or a 256-bit memory location. The destination 
operand is a YMM register. 

• The source operand is a YMM register. The destination operand is either a YMM register or a 
256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVAPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVAPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Opcode 

OF 28 It 

OF 29 /r 


Description 

Moves four packed single-precision floating-point 
values from xmm2 or mem128 to xmml. 

Moves four packed single-precision floating-point 
values from xmml or mem128 to xmm2. 

Encoding 


Mnemonic 

MOVAPS xmml , xmm2/mem128 
MOVAPS xmm1/mem128, xmm2 

Mnemonic 

VMOVAPS xmml , xmm2/mem128 
VMOVAPS xmm1/mem128, xmm2 
VMOVAPS ymml, ymm2/mem256 
VMOVAPS ymm1/mem256, ymm2 

Related Instructions 

(V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, 
(V)MOVUPS 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

04 

RXB.00001 

X.1111.0.00 

28 /r 

04 

RXB.00001 

X.1111.0.00 

29 /r 

04 

RXB.00001 

X.1111.1.00 

28 /r 

04 

RXB.00001 

X.1111.1.00 

29 /r 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVD Move 

VMOVD Doubleword or Quadword 

Moves 32-bit and 64-bit values. A value can be moved from a general-purpose register or memory 

location to the corresponding low-order bits of an XMM register, with zero-extension to 128 bits; or 

from the low-order bits of an XMM register to a general-purpose register or memory location. 

The quadword form of this instruction is distinct from the differently-encoded (V)MOVQ instruction. 

There are legacy and extended fonns of the instruction: 

MOVD 

There are two encodings for 32-bit moves, characterized by REX.W = 0. 

• The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The 
destination is an XMM register. The 32-bit value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either a 32-bit general-purpose register 
or a 32-bit memory location. 

There are two encodings for 64-bit moves, characterized by REX.W = 1. 

• The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The 
destination is an XMM register. The 64-bit value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either a 64-bit general-purpose register 
or a 64-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVD 

The extended form of the instruction has four 128-bit encodings: 

There are two encodings for 32-bit moves, characterized by VEX.W = 0. 

• The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The 
destination is an XMM register. The 32-bit value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either a 32-bit general-purpose register 
or a 32-bit memory location. 

There are two encodings for 64-bit moves, characterized by VEX.W = 1. 

• The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The 
destination is an XMM register. The 64-bit value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either a 64-bit general-purpose register 
or a 64-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MOVD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

MOVD xmm, reg32/mem32 
MOVD xmm, reg64lmem64 
MOVD reg32lmem32, xmm 
MOVD reg64lmem64, xmm 

Mnemonic 

VMOVD 1 xmm, reg32/mem32 
VMOVQ xmm, reg64lmem64 
VMOVD 1 r eg32/mem32, xmm 
VMOVQ r eg64lmem64, xmm 
Note: 1. Also known as MOVQ in some developer tools. 

Related Instructions 

(V)MOVDQA, (V)MOVDQU, (V)MOVQ 


Description 

Move a 32-bit value from reg32/mem32 to xmm. 
Move a 64-bit value from reg64lmem64 to xmm. 
Move a 32-bit value from xmm to reg32lmem32 
Move a 64-bit value from xmm to reg64lmem64. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

04 

RXB.00001 

0.1111.0.01 

6E/r 

04 

RXB.00001 

1.1111.0.01 

6E/r 

04 

RXB.00001 

0.1111.0.01 

7E/r 

04 

RXB.00001 

1.1111.0.01 

7E/r 


Opcode 

66 (WO) OF 6E It 
66 (W1)0F6E/r 
66 (WO) OF 7E It 
66 (W1)0F7E/r 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVDDUP Move and Duplicate 

VMOVDDUP Double-Precision Floating-Point 

Moves and duplicates double-precision floating-point values. 

There are legacy and extended forms of the instruction: 

MOVDDUP 

Moves and duplicates one quadword value. 

The source operand is either the low 64 bits of an XMM register or the address of the least-significant 
byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are not affected. 

VMOVDDUP 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves and duplicates one quadword value. 

The source operand is either the low 64 bits of an XMM register or the address of the least-significant 
byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

Moves and duplicates two even-indexed quadword values. 

The source operand is either a YMM register or the address of the least-significant byte of 256 bits of 
data in memory. The destination is a YMM register.Bits [63:0] of the source are written to bits 
[127:64] and [63:0] of the destination; bits [191:128] of the source are written to bits [255:192] and 
[191:128] of the destination. 

instruction Support 


Form 

Subset 

Feature Flag 

MOVDDUP 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VMOVDDUP 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVDDUP xmml, xmm2lmem64 F2 OF 12 /r Moves two copies of the low 64 bits of xmm2 or 

mem64 to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

MOVDDUP xmml, xmm2lmem64 

C4 

RXB.00001 

X.1111.0.11 

12/r 

MOVDDUP ymml, ymm2/mem256 

C4 

RXB.00001 

X.1111.1.11 

12 /r 
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Related Instructions 

(V)MOVSHDUP, (V)MOVSLDUP 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference with alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVDQA Move Aligned 

VMOVDQA Double Quadword 

Moves aligned packed integer values. Values can be moved from a register or a memory location to a 
register, or from a register to a register or a memory location. 

A memory operand that is not aligned causes a general-protection exception. 

There are legacy and extended forms of the instruction: 

MOVDQA 

Moves two aligned quadwords (128-bit move). There are two encodings. 

• The source operand is an XMM register. The destination is either an XMM register or a 128-bit 
memory location. 

• The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVDQA 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves two aligned quadwords (128-bit move). There are two encodings. 

• The source operand is an XMM register. The destination is either an XMM register or a 128-bit 
memory location. 

• The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves four aligned quadwords (256-bit move). There are two encodings. 

• The source operand is a YMM register. The destination is either a YMM register or a 256-bit 
memory location. 

• The source operand is either a YMM register or a 256-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVDQA 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVDQA 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

MOVDQA xmml, xmm2/mem128 66 OF 6F /r 

MOVDQA xmm1/mem128, xmm2 66 OF 7F /r 

Mnemonic 

VMOVDQA xmml , xmm2lmem128 
VMOVDQA xmm1/mem128, xmm2 
VMOVDQA ymml , xmm2lmem256 
VMOVDQA ymm1/mem256, ymm2 


Description 

Moves aligned packed integer values from xmm2 
or mem 128 to xmml. 

Moves aligned packed integer values from xmml or 
mem 128 to xmm2. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

04 

RXB.00001 

X.1111.0.01 

6F/r 

04 

RXB.00001 

X.1111.0.01 

6F/r 

04 

RXB.00001 

X.1111.1.01 

7F/r 

04 

RXB.00001 

X.1111.1.01 

7F/r 


Related Instructions 

(V)MOVD, (V)MOVDQU, (V)MOVQ 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVDQU Move 

VMOVDQU Unaligned Double Quadword 

Moves unaligned packed integer values. Values can be moved from a register or a memory location to 

a register, or from a register to a register or a memory location. 

There are legacy and extended forms of the instruction: 

MOVDQU 

Moves two unaligned quadwords (128-bit move). There are two encodings. 

• The source operand is an XMM register. The destination is either an XMM register or a 128-bit 
memory location. 

• The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVDQU 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves two unaligned quadwords (128-bit move). There are two encodings: 

• The source operand is an XMM register. The destination is either an XMM register or a 128-bit 
memory location. 

• The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves four unaligned quadwords (256-bit move). There are two encodings: 

• The source operand is a YMM register. The destination is either a YMM register or a 256-bit 
memory location. 

• The source operand is either a YMM register or a 256-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVDQU 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVDQU 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

MOVDQU xmml, xmm2/mem128 F3 OF 6F/r 

MOVDQU xmm 1/meml28, xmm2 F3 OF 7F/r 

Mnemonic 

VMOVDQU xmml , xmm2lmem128 
VMOVDQU xmm1/mem128, xmm2 
VMOVDQU ymml, xmm2lmem256 
VMOVDQU ymm1/mem256, ymm2 

Related Instructions 

(V)MOVD, (V)MOVDQA, (V)MOVQ 


Description 

Moves unaligned packed integer values from xmm2 or 
mem 128 to xmml. 

Moves unaligned packed integer values from xmml or 
mem 128 to xmm2. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

04 

RXB.00001 

X.1111.0.10 

6F/r 

04 

RXB.00001 

X.1111.0.10 

6F/r 

04 

RXB.00001 

X.1111.1.10 

7F/r 

04 

RXB.00001 

X.1111.1.10 

7F/r 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

X 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MOVDQU, VMOVDQU 
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MOVHLPS Move High to Low 

VMOVHLPS Packed Single-Precision Floating-Point 

Moves two packed single-precision floating-point values from the high quadword of an XMM regis¬ 
ter to the low quadword of an XMM register. 

There are legacy and extended fonns of the instruction: 

MOVHLPS 

The source operand is bits [127:64] of an XMM register. The destination is bits [63:0] of an XMM 
register. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are not affected. 

VMOVHLPS 

The extended fonn of the instruction has a 128-bit encoding only. 

The source operands are bits [127:64] of two XMM registers. The destination is a third XMM regis¬ 
ter. Bits [127:64] of the first source are moved to bits [127:64] of the destination; bits [127:64] of the 
second source are moved to bits [63:0] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MOVHLPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVHLPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 
Mnemonic 

MOVHLPS xmml, xmm2 

Mnemonic 

VMOVHLPS xmml, xmm2, xmm3 


Opcode Description 

OF 12 /r Moves two packed single-precision floating-point 
values from xmm2[127:64] to xmml[63:0], 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.0.00 12/r 


Related Instructions 

(V)MOVAPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, 
(V)MOVUPS 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MOVHLPS, VMOVHLPS 
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MOVHPD Move High 

VMOVHPD Packed Double-Precision Floating-Point 

Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory 

location to the high-order quadword of an XMM register, or from the high-order quadword of an 

XMM register to a 64-bit memory location. 

There are legacy and extended forms of the instruction: 

MOVHPD 

There are two encodings. 

• The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM 
register. 

• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory 
location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVHPD 

The extended form of the instruction has two 128-bit encodings: 

• There are two source operands. The first source is an XMM register. The second source is a 64-bit 
memory location. The destination is an XMM register. Bits [63:0] of the source register are written 
to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits 
[127:64] of the destination. 

• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory 
location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVHPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVHPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

MOVHPD xmml, mem64 
MOVHPD mem64, xmml 


Opcode 

66 OF 16 /r 

66 OF 17/r 


Description 

Moves a packed double-precision floating-point value from 
mem64 to xmm1[127:64]. 

Moves a packed double-precision floating-point value from 
xmml[127:64] to mem64. 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVHPD xmml, xmm2, mem64 

04 

RXB.00001 

X.src.0.01 

16 /r 

VMOVHPD mem64, xmml 

04 

RXB.00001 

X.1111.0.01 

17/r 


Related Instructions 

(V)MOVAPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination encoding only). 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MOVHPD, VMOVHPD 


197 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


MOVHPS Move High 

VMOVHPS Packed Single-Precision Floating-Point 

Moves two packed single-precision floating-point value. Values can be moved from a 64-bit memory 

location to the high-order quadword of an XMM register, or from the high-order quadword of an 

XMM register to a 64-bit memory location. 

There are legacy and extended forms of the instruction: 

MOVHPS 

There are two encodings. 

• The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM 
register. 

• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory 
location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVHPS 

The extended form of the instruction has two 128-bit encodings: 

• There are two source operands. The first source is an XMM register. The second source is a 64-bit 
memory location. The destination is an XMM register. Bits [63:0] of the source register are written 
to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits 
[127:64] of the destination. 

• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory 
location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVHPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVHPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode 

MOVHPS xmml , mem64 OF 16 /r 

MOVHPS mem64, xmml OF 17 /r 

Mnemonic 

VMOVHPS xmml, xmm2, mem64 
VMOVHPS mem64, xmml 


Description 

Moves two packed double-precision floating-point value from 
mem64 to xmm1[127:64]. 

Moves two packed double-precision floating-point value from 
xmml[127:64] to mem64. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.0.00 16/r 

C4 RXB.00001 X.1111.0.00 17/r 


Related Instructions 

(V)MOVAPS, (V)MOVHLPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, 
(V)MOVUPS 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination encoding only). 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


MOVHPS, VMOVHPS 
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MOVLHPS Move Low to High 

VMOVLHPS Packed Single-Precision Floating-Point 

Moves two packed single-precision floating-point values from the low quadword of an XMM register 
to the high quadword of a second XMM register. 

There are legacy and extended forms of the instruction: 

MOVLHPS 

The source operand is bits [63:0] of an XMM register. The destination is bits [127:64] of an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVLHPS 

The extended fonn of the instruction has a 128-bit encoding only. 

The source operands are bits [63:0] of two XMM registers. The destination is a third XMM register. 
Bits [63:0] of the first source are moved to bits [63:0] of the destination; bits [63:0] of the second 
source are moved to bits [127:64] of the destination. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MOVLHPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVLHPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 
Mnemonic 

MOVLHPS xmml, xmm2 

Mnemonic 

VMOVLHPS xmml, xmm2, xmm3 


Opcode Description 

OF 16 /r Moves two packed single-precision floating-point 
values from xmm2[63:0] to xmml[127:64], 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.src.0.00 16 /r 


Related Instructions 

(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, 
(V)MOVUPS 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 
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MOVLPD Move Low 

VMOVLPD Packed Double-Precision Floating-Point 

Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory 

location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM 

register to a 64-bit memory location. 

There are legacy and extended fonns of the instruction: 

MOVLPD 

There are two encodings. 

• The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. 

VMOVLPD 

The extended form of the instruction has two 128-bit encodings. 

• There are two source operands. The first source is an XMM register. The second source is a 64-bit 
memory location. The destination is an XMM register. Bits [127:64] of the source register are 
written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to 
bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. 


instruction Support 


Form 

Subset 

Feature Flag 

MOVLPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVLPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 

MOVLPD xmml, mem64 
MOVLPD mem64 , xmml 


Opcode Description 

66 OF 12 /r Moves a packed double-precision floating-point value from 

mem64 to xmml[63:0], 

66 OF 13 /r Moves a packed double-precision floating-point value from 

xmml[63:0] to mem64. 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVLPD xmml, xmm2, mem64 

C4 

RXB.00001 

X.src.0.01 

12/r 

VMOVLPD mem64, xmml 

C4 

RXB.00001 

X.1111.0.01 

13/r 
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Related Instructions 

(V)MOVAPD, (V)MOVHPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination encoding only). 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVLPS Move Low Packed Single-Precision 

VMOVLPS Floating-Point 

Moves two packed single-precision floating-point values. Values can be moved from a 64-bit memory 

location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM 

register to a 64-bit memory location. 

There are legacy and extended fonns of the instruction: 

MOVLPS 

There are two encodings. 

• The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. 

VMOVLPS 

The extended fonn of the instruction has two 128-bit encodings. 

• There are two source operands. The first source is an XMM register. The second source is a 64-bit 
memory location. The destination is an XMM register. Bits [127:64] of the source register are 
written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to 
bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVLPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVLPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

MOVLPS xmml, mem64 OF 12 /r 

MOVLPS mem64, xmml OF 13 /r 

Mnemonic 

VMOVLPS xmml, xmm2, mem64 
VMOVLPS mem64, xmml 


Description 

Moves two packed single-precision floating-point value from 
mem64 to xmml[63:0], 

Moves two packed single-precision floating-point value from 


xmml [63:0] to 

mem64. 




Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.src.0.00 

12/r 

C4 

RXB.00001 

X.1111.0.00 

13/r 
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Related Instructions 

(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVMSKPS, (V)MOVSS, 
(V)MOVUPS 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination encoding only). 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVMSKPD Extract Sign Mask 

VMOVMSKPD Packed Double-Precision Floating-Point 

Extracts the sign bits of packed double-precision floating-point values from an XMM register, zero- 
extends the value, and writes it to the low-order bits of a general-purpose register. 

There are legacy and extended fonns of the instruction: 

MOVMSKPD 

Extracts two mask bits. 

The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining 
bits. Bits [255:128] of the YMM register that corresponds to the source are not affected. 

MOVMSKPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 
Extracts two mask bits. 

The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining 
bits. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Extracts four mask bits. 

The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining 
bits. 


instruction Support 


Form 

Subset 

Feature Flag 

MOVMSKPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVMSKPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVMSKPD reg, xmm 66 OF 50 /r Move zero-extended sign bits of packed double-precision 

values from xmm to a general-purpose register. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVMSKPD reg, xmm 

C4 

RXB.00001 

X.1111.0.01 

50 /r 

VMOVMSKPD reg, ymm 

C4 

RXB.00001 

X.1111.1.01 

50 /r 
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Related Instructions 

(V)MOVMSKPS, (V)PMOVMSKB 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVMSKPS Extract Sign Mask 

VMOVMSKPS Packed Single-Precision Floating-Point 

Extracts the sign bits of packed single-precision floating-point values from an XMM register, zero- 
extends the value, and writes it to the low-order bits of a general-purpose register. 

There are legacy and extended fonns of the instruction: 

MOVMSKPS 

Extracts four mask bits. 

The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining 
bits. 

MOVMSKPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 
Extracts four mask bits. 

The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining 
bits. 

YMM Encoding 

Extracts eight mask bits. 

The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general pur¬ 
pose register. Writes the extracted bits to positions [7:0] of the destination and clears the remaining 
bits. 


instruction Support 


Form 

Subset 

Feature Flag 

MOVMSKPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVMSKPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVMSKPS reg, xmm OF 50 /r Move zero-extended sign bits of packed single-precision 

values from xmm to a general-purpose register. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVMSKPS reg, xmm 

C4 

RXB.00001 

X.1111.0.00 

50 /r 

VMOVMSKPS reg, ymm 

C4 

RXB.00001 

X.1111.1.00 

50 /r 
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Related Instructions 

(V)MOVMSKPD, (V)PMOVMSKB 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVNTDQ Move Non-Temporal 

VMOVNTDQ Double Quadword 

Moves double quadword values from a register to a memory location. 

Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The 
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollu¬ 
tion. The method of minimization depends on the hardware implementation of the instruction. For 
further infonnation, see “Memory Optimization” in Volume 1. 

The instruction is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ 
with respect to other stores. 

An attempted store to a non-aligned memory location results in a #GP exception. 

There are legacy and extended fonns of the instruction: 

MOVNTDQ 

Moves one 128-bit value. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

VMOVNTDQ 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 
Moves one 128-bit value. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

YMM Encoding 

Moves two 128-bit values. 

The source operand is a YMM register. The destination is a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVNTDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVNTDQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVNTDQ mem128, xmm 66 OF E7 /r Moves a 128-bit value from xmm to mem128, minimizing 

cache pollution. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVNTDQ mem128, xmm 

C4 

RXB.00001 

X.1111.0.01 

E7 /r 

VMOVNTDQ mem256, ymm 

C4 

RXB.00001 

X.1111.1.01 

E7 /r 
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Related Instructions 

(V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTPS 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVNTDQA Move Non-Temporal 

VMOVNTDQA Double Quadword Aligned 

Loads an XMM/YMM register from a naturally-aligned 128-bit or 256-bit memory location. 

Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The 
processor treats the load as a write-combining (WC) memory read, which minimizes cache pollution. 
The method of minimization depends on the hardware implementation of the instruction. For further 
information, see “Memory Optimization” in Volume 1. 

The instruction is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an MFENCE instruction to force strong memory ordering of MOVNTDQA with respect 
to other reads. 

An attempted load from a non-aligned memory location results in a #GP exception. 

There are legacy and extended fonns of the instruction: 

MOVNTDQA 

Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location. 

VMOVNTDQA 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location. 

YMM Encoding 

Loads a 256-bit value into the specified YMM register from a 32-byte aligned memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVNTDQA 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VMOVNTDQA 128-bit 

AVX 

CPUID Fn0000_0001 _ECX[AVX] (bit 28) 

VMOVNTDQA 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

MOVNTDQA xmm, mem128 66 OF 38 2A /r Loads xmm from an aligned memory location, minimizing 

cache pollution. 

Encoding 


Mnemonic 

VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVNTDQA xmm, mem128 

C4 

RXB.02 

X.1111.0.01 

2A /r 

VMOVNTDQA ymm, mem256 

C4 

RXB.02 

X.1111.1.01 

2A /r 


Related Instructions 

(V)MOVNTDQ, (V)MOVNTPD, (V)MOVNTPS 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX, AVX2 exception 

S — SSE exception 
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MOVNTPD Move Non-Temporal 

VMOVNTPD Packed Double-Precision Floating-Point 

Moves packed double-precision floating-point values from a register to a memory location. 

Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The 
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollu¬ 
tion. The method of minimization depends on the hardware implementation of the instruction. For 
further infonnation, see “Memory Optimization” in Volume 1. 

The instruction is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ 
with respect to other stores. 

An attempted store to a non-aligned memory location results in a #GP exception. 

There are legacy and extended fonns of the instruction: 

MOVNTPD 

Moves two values. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

MOVNTPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves two values. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

YMM Encoding 

Moves four values. 

The source operand is a YMM register. The destination is a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVNTPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVNTPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVNTPD mem128, xmm 66 OF 2B /r Moves two packed double-precision floating-point values 

from xmm to mem128, minimizing cache pollution. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVNTPD mem128, xmm 

C4 

RXB.00001 

X.1111.0.01 

2B/r 

VMOVNTPD mem256, ymm 

C4 

RXB.00001 

X.1111.1.01 

2B/r 
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Related Instructions 

MOVNTDQ, MOVNTI, MOVNTPS, MOVNTQ 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVNTPS Move Non-Temporal 

VMOVNTPS Packed Single-Precision Floating-Point 

Moves packed single-precision floating-point values from a register to a memory location. 

Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The 
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollu¬ 
tion. The method of minimization depends on the hardware implementation of the instruction. For 
further infonnation, see “Memory Optimization” in Volume 1. 

The instruction is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ 
with respect to other stores. 

An attempted store to a non-aligned memory location results in a #GP exception. 

There are legacy and extended fonns of the instruction: 

MOVNTPS 

Moves four values. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

MOVNTPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves four values. 

The source operand is an XMM register. The destination is a 128-bit memory location. 

YMM Encoding 

Moves eight values. 

The source operand is a YMM register. The destination is a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVNTPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVNTPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MOVNTPS mem128, xmm OF 2B /r Moves four packed double-precision floating-point values 

from xmm to mem128, minimizing cache pollution. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVNTPS me ml 28, xmm 

C4 

RXB.00001 

X.1111.0.00 

2B /r 

VMOVNTPS mem256, ymm 

C4 

RXB.00001 

X.1111.1.00 

2B/r 
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Related Instructions 

(V)MOVNTDQ, (V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTQ 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVNTSD Move Non-Temporal Scalar 

Double-Precision Floating-Point 

Stores one double-precision floating-point value from an XMM register to a 64-bit memory location. 
This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used 
again soon. The processor treats the store as a write-combining memory write, which minimizes cache 
pollution. 

The diagram below illustrates the operation of this instruction: 


mem64 



Instruction Support 


Form 

Subset 

Feature Flag 

MOVNTSD 

SSE4A 

CPUID Fn8000_0001_ECX[SSE4A] (bit 6) 


Software must check the CPUID bit once per program or library initialization before using the 
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain 
processor feature support information, see Appendix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

Stores one double-precision floating-point XMM 

MOVNTSD mem64, xmm F2 OF 2B /r register value into a 64 bit memory location. Treat as 

a non-temporal store. 


Related Instructions 

MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS, MOVNTQ, MOVNTSS 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE4A instructions are not supported, as 
indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. 

X 

X 

X 

The emulate bit (CRO.EM) was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(CR4.0SFXSR) was cleared to 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (CRO.TS) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 



X 

The destination operand was in a non-writable 
segment. 

Page fault, #PF 


X 

X 

A page fault resulted from executing the instruction. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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MOVNTSS Move Non-Temporal Scalar 

Single-Precision Floating-Point 

Stores one single-precision floating-point value from an XMM register to a 32-bit memory location. 
This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used 
again soon. The processor treats the store as a write-combining memory write, which minimizes cache 
pollution. 

The diagram below illustrates the operation of this instruction: 


mem32 



instruction Support 


Form 

Subset 

Feature Flag 

MOVNTSS 

SSE4A 

CPUID Fn8000_0001_ECX[SSE4A] (bit 6) 


Software must check the CPUID bit once per program or library initialization before using the 
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain 
processor feature support information, see Appendix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

Stores one single-precision floating-point XMM 

MOVNTSS mem32, xmm F3 OF 2B /r register value into a 32-bit memory location. Treat as 

a non-temporal store. 


Related Instructions 

MOVNTDQ, MOVNTI, MOVNTOPD, MOVNTPS, MOVNTQ, MOVNTSD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE4A instructions are not supported, as 
indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. 

X 

X 

X 

The emulate bit (CRO.EM) was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(CR4.0SFXSR) was cleared to 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (CRO.TS) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 



X 

The destination operand was in a non-writable 
segment. 

Page fault, #PF 


X 

X 

A page fault resulted from executing the instruction. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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MOVQ Move 

VMOVQ Quadword 

Moves 64-bit values. The source is either the low-order quadword of an XMM register or a 64-bit 

memory location. The destination is either the low-order quadword of an XMM register or a 64-bit 

memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. 

There are legacy and extended forms of the instruction: 

MOVQ 

There are two encodings: 

• The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. The 64-bit value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either an XMM register or a 64-bit 
memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVQ 

The extended form of the instruction has three 128-bit encodings: 

• The source operand is an XMM register. The destination is an XMM register. The 64-bit value is 
zero-extended to 128 bits. 

• The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit 
value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either an XMM register or a 64-bit 
memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic 

MOVQ xmml, xmm2lmem64 
MOVQ xmm1lmem64, xmm2 

Mnemonic 

VMOVQ xmml, xmm2 
VMOVQ xmml, mem64 
VMOVQ xmm1lmem64, xmm2 

Related Instructions 

(V)MOVD, (V)MOVDQA, (V)MOVDQU 


Description 

Move a zero-extended 64-bit value from xmm2 or mem64 
to xmml. 

Move a 64-bit value from xmm2 to xmml or mem64. 
Zero-extends for register destination. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.10 

7E /r 

C4 

RXB.00001 

X.1111.0.10 

7E /r 

C4 

RXB.00001 

X.1111.0.01 

D6 /r 


Opcode 

F3 OF 7E /r 

66 OF D6 /r 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

s 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVSD Move 

VMOVSD Scalar Double-Precision Floating-Point 

Moves scalar double-precision floating point values. The source is either a low-order quadword of an 
XMM register or a 64-bit memory location. The destination is either a low-order quadword of an 
XMM register or a 64-bit memory location. 

There are legacy and extended forms of the instruction: 

MOVSD 

There are two encodings. 

• The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. If the source operand is a register, bits [127:64] of the destination are not affected. 
If the source operand is a 64-bit memory location, the upper 64 bits of the destination are cleared. 

• The source operand is an XMM register. The destination is either an XMM register or a 64-bit 
memory location. When the destination is a register, bits [127:64] of the destination are not 
affected. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVSD 

The extended fonn of the instruction has four 128-bit encodings. Two of the encodings are function¬ 
ally equivalent. 

• The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit 
value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is a 64-bit memory location. 

• Two functionally-equivalent encodings: 

There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first 
source register are copied to bits [127:64] of the destination; the 64-bit value in bits [63:0] of the 
second source register is written to bits [63:0] of the destination. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

This instruction must not be confused with the MOVSD (move string doubleword) instruction of the 
general-purpose instruction set. Assemblers can distinguish the instructions by the number and type 
of operands. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic 

MOVSD xmml, xmm2/mem64 
MOVSD xmm1lmem64, xmm2 


Opcode 

F2 0F10/r Moves a 64-bit 
extends to 128 

F2 OF 11 /r Moves a 64-bit 


Description 

value from xmm2 or mem64 to xmml. Zero 
bits when source operand is memory. 

value from xmm2 to xmml or mem64. 

Encoding 1 


Mnemonic 

VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVSD xmml, mem64 

04 

RXB.00001 

X.1111.X.11 

10 It 

VMOVSD mem64, xmml 

04 

RXB.00001 

X.1111.X.11 

11 It 

VMOVSD xmml, xmm2, xmm3 2 

04 

RXB.00001 

X.src.X.11 

10 It 

VMOVSD xmml, xmm2, xmm3 2 

04 

RXB.00001 

X.src.X.11 

11 It 


Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and 
the three operand form (where all operands are held in registers). 

Note 2: These two encodings are functionally equivalent. 


Related Instructions 

(V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVUPD 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.vvvv I = 1111b (for memory destination enoding only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVSHDUP Move High and Duplicate 

VMOVSHDUP Single-Precision 

Moves and duplicates odd-indexed single-precision floating-point values. 

There are legacy and extended forms of the instruction: 

MOVSHDUP 

Moves and duplicates two odd-indexed single-precision floating-point values. 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the des¬ 
tination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destina¬ 
tion. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVSHDUP 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves and duplicates two odd-indexed single-precision floating-point values. 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the des¬ 
tination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destina¬ 
tion. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves and duplicates four odd-indexed single-precision floating-point values. 

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM reg¬ 
ister. Bits [255:224] of the source are duplicated and written to bits [255:224] and [223:192] of the 
destination. Bits [191:160] of the source are duplicated and written to bits [191:160] and [159:128] of 
the destination. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of 
the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the 
destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

MOVSHDUP 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VMOVSHDUP 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MOVSHDUP xmml, xmm2lmem128 F3 OF 16 /r Moves and duplicates two odd-indexed single¬ 
precision floating-point values in xmm2 or mem128. 
Writes to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VMOVSHDUP xmml, xmm2/mem128 

04 

RXB.00001 

X.1111.0.10 

16/r 

VMOVSHDUP ymml, ymm2/mem256 

04 

RXB.00001 

X.1111.1.10 

16/r 


Related Instructions 

(V)MOVDDUP, (V)MOVSLDUP 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVSLDUP Move Low and Duplicate 

VMOVSLDUP Single-Precision 

Moves and duplicates even-indexed single-precision floating-point values. 

There are legacy and extended forms of the instruction: 

MOVSLDUP 

Moves and duplicates two even-indexed single-precision floating-point values. 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the desti¬ 
nation. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destina¬ 
tion. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVSLDUP 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves and duplicates two even-indexed single-precision floating-point values. 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the desti¬ 
nation. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destina¬ 
tion. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves and duplicates four even-indexed single-precision floating-point values. 

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM reg¬ 
ister. Bits [223:192] of the source are duplicated and written to bits [255:224] and [223:192] of the 
destination. Bits [159:128] of the source are duplicated and written to bits [191:160] and [159:128] of 
the destination. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of 
the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the 
destination. 

Instruction Support 


Form 

Subset 

Feature Flag 

MOVSLDUP 

SSE3 

CPUID Fn0000_0001_ECX[SSE3] (bit 0) 

VMOVSLDUP 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

MOVSLDUP xmml, xmm2lmem128 F3 OF 12 /r Moves and duplicates two even-indexed single¬ 
precision floating-point values in xmm2 or mem128. 
Writes to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VMOVSLDUP xmml , xmm2lmem128 

04 

RXB.00001 

X.1111.0.10 

12/r 

VMOVSLDUP ymml, ymm2/mem256 

04 

RXB.00001 

X.1111.1.10 

12/r 


Related Instructions 

(V)MOVDDUP, (V)MOVSHDUP 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVSS Move 

VMOVSS Scalar Single-Precision Floating-Point 

Moves scalar single-precision floating point values. The source is either a low-order doubleword of 

an XMM register or a 32-bit memory location. The destination is either a low-order doubleword of an 

XMM register or a 32-bit memory location. 

There are legacy and extended forms of the instruction: 

MOVSS 

There are three encodings. 

• The source operand is an XMM register. The destination is an XMM register. Bits [127:32] of the 
destination are not affected. 

• The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit 
value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is either an XMM register or a 32-bit 
memory location. When the destination is a register, bits [127:32] of the destination are not 
affected. 

Bits [255:128] of the YMM register that corresponds to the source are not affected. 

VMOVSS 

The extended form of the instruction has four 128-bit encodings. Two of the encodings are function¬ 
ally equivalent. 

• The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit 
value is zero-extended to 128 bits. 

• The source operand is an XMM register. The destination is a 32-bit memory location. 

• Two functionally-equivalent encodings: 

There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first 
source register are copied to bits [127:64] of the destination; the 32-bit value in bits [31:0] of the 
second source register is written to bits [31:0] of the destination. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


instruction Support 


Form 

Subset 

Feature Flag 

MOVSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

Opcode 

MOVSS xmml, xmm2 

F3 OF 10/r 

MOVSS xmml, mem32 

F3 OF 10/r 

MOVSS xmm2/mem32, xmml 

F3 OF 11 /r 

Mnemonic 



Description 

Moves a 32-bit value from xmm2 to xmml. 

Moves a zero-extended 32-bit value from mem32 to xmml. 
Moves a 32-bit value from xmml to xmm2 or mem32. 


Encoding 1 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.X.10 

10/r 

C4 

RXB.00001 

X.1111.X.10 

11 /r 

C4 

RXB.00001 

X.src.X.10 

10/r 

C4 

RXB.00001 

X.src.X.10 

11 /r 


VMOVSS xmml, mem32 
VMOVSS mem32, xmml 
VMOVSS xmml, xmm2, xmm3 2 
VMOVSS xmml, xmm2, xmm3 2 


Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and 
the three operand form (where all operands are held in registers). 

Note 2: These two encodings are functionally equivalent. 


Related Instructions 

(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, 
(V)MOVUPS 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination enoding only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

s 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

s 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVUPD Move Unaligned 

VMOVUPD Packed Double-Precision Floating-Point 

Moves packed double-precision floating-point values. Values can be moved from a register or mem¬ 
ory location to a register; or from a register to a register or memory location. 

A memory operand that is not aligned does not cause a general-protection exception. 

There are legacy and extended forms of the instruction: 

MOVUPD 

Moves two double-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVUPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves two double-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves four double-precision floating-point values. There are encodings for each type of move. 

• The source operand is either a YMM register or a 256-bit memory location. The destination 
operand is a YMM register. 

• The source operand is a YMM register. The destination operand is either a YMM register or a 
256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVUPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMOVUPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

MOVUPD xmml, xmm2/mem128 66 OF 10 /r 

MOVUPD xmm1/mem128, xmm2 66 OF 11 /r 

Mnemonic 

VMOVUPD xmml, xmm2/mem128 
VMOVUPD xmm1/mem128, xmm2 
VMOVUPD ymml, ymm2/mem256 
VMOVUPD ymm1/mem256, ymm2 


Description 

Moves two packed double-precision floating-point 
values from xmm2 or mem128 to xmml. 

Moves two packed double-precision floating-point 
values from xmml or mem128 to xmm2. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.01 

10/r 

C4 

RXB.00001 

X.1111.0.01 

11 /r 

C4 

RXB.00001 

X.1111.1.01 

10/r 

C4 

RXB.00001 

X.1111.1.01 

11 /r 


Related Instructions 

(V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

X 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MOVUPS Move Unaligned 

VMOVUPS Packed Single-Precision Floating-Point 

Moves packed single-precision floating-point values. Values can be moved from a register or memory 
location to a register; or from a register to a register or memory location. 

A memory operand that is not aligned does not cause a general-protection exception. 

There are legacy and extended forms of the instruction: 

MOVUPS 

Moves four single-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VMOVUPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Moves four single-precision floating-point values. There are encodings for each type of move. 

• The source operand is either an XMM register or a 128-bit memory location. The destination 
operand is an XMM register. 

• The source operand is an XMM register. The destination operand is either an XMM register or a 
128-bit memory location. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Moves eight single-precision floating-point values. There are encodings for each type of move. 

• The source operand is either a YMM register or a 256-bit memory location. The destination 
operand is a YMM register. 

• The source operand is a YMM register. The destination operand is either a YMM register or a 
256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

MOVUPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMOVUPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

MOVUPS xmml, xmm2/mem128 OF 10 /r 

MOVUPS xmm1/mem128, xmm2 OF 11 /r 

Mnemonic 

VMOVUPS xmml, xmm2/mem128 
VMOVUPS xmm1/mem128, xmm2 
VMOVUPS ymml, ymm2/mem256 
VMOVUPS ymm1/mem256, ymm2 


Description 

Moves four packed single-precision floating-point 
values from xmm2 or unaligned mem128 to xmml. 

Moves four packed single-precision floating-point 
values from xmml or unaligned mem128 to xmm2. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.00001 

X.1111.0.00 

10/r 

C4 

RXB.00001 

X.1111.0.00 

11 /r 

C4 

RXB.00001 

X.1111.1.00 

10/r 

C4 

RXB.00001 

X.1111.1.00 

11 /r 


Related Instructions 

(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, 
(V)MOVSS 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

X 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MPSADBW Multiple Sum of Absolute Differences 

VMPSADBW 

Calculates 8 or 16 sums of absolute differences of sequentially selected groups of four contiguous 
unsigned byte integers in the first source operand and a selected group of four contiguous unsigned 
byte integers in a second source operand and writes the eight or sixteen 16-bit unsigned integer sums 
to sequential words of the destination register. The 256-bit form of the instruction additionally per¬ 
forms a similar but independent calculation using the upper 128 bits of the source operands. 

Figure 2-2 on page 238 provides a graphical representation of the operation of the instruction. The 
following description accompanies it. 

The computation uses as inputs 11 bytes from the first source operand and 4 bytes in the second 
source operand. Bit fields in the imm8 operand specify the index of the right-most byte of each group. 

Bits [1:0] of the immediate operand detennine the index of the right-most byte of four contiguous 
bytes within the second source operand used in the operation that produces the result (or, in the case 
of the 256-bit fonn of the instruction, the lower 128 bits of the result). Bit 2 of the immediate operand 
determines the right-most index of the 11 contiguous bytes in the first source operand used in the same 
calculation. In the 128-bit form of the instruction, bits [7:3] of the immediate operand are ignored. 

Bits [4:3] of the immediate operand detennine the index of the right-most byte of four contiguous 
bytes within the second source operand used in the operation that produces the upper 128 bits of the 
result in the 256-bit form of the instruction. Bit 5 of the immediate operand determines the right-most 
index of the 11 contiguous bytes within in the upper half of the first 256-bit source operand used in 
the same calculation. In the 256-bit form of the instruction, bits [7:6] of the immediate operand are 
ignored. 

Each word of the destination register receives the result of a separate computation of the sum of abso¬ 
lute differences function applied to a specific pair of four-element vectors derived from the source 
operands. The sum of absolute differences function SumAbsDiff (A, B) takes as input two 4-element 
unsigned 8-bit integer vectors and produces a single unsigned 16-bit integer result. The function is 
defined as: 

SumAbsDiff(A, B) = | A[0]-B[0] | + | A[1]-B[1] | + | A[2]-B[2] | + | A[3]-B[3] | 

The sum of absolute differences function produces a quantitative measure of the difference between 
two 4-element vectors. Each of the calculations that generates a result uses this metric to assess the 
difference between the selected 4-byte vector from operand 2 (B in the above equation) with each of 
eight overlapping 4-byte vectors (A in the equation) selected sequentially from the first source oper¬ 
and. 

The right-most word (Word 0) of the destination receives the result of the comparison of the right¬ 
most 4 bytes of the selected group of 11 from operand 1 (srcl[ i 1+3 : il], as shown in the figure) to 
the selected 4 bytes from operand 2 (src2[j 1+3 :j 1], in the figure). Word 1 of the destination receives 
the result of the comparison of the four bytes starting at an offset of 1 from the right-most byte of the 
group of 11 (srcl[ i 1+4 : il+1] in the figure) to the 4 bytes from operand 2. Word 2 of the destination 
receives the result of the comparison of the four bytes starting at an offset of 2 from the right-most 
byte of the group of 11 (srcl[ i 1+5 : i 1+2], in the figure) to the selected 4 bytes from operand 2. This 
continues in like manner until the left-most four bytes of the 11 are compared to the 4 bytes from 
operand 2 with the result being written to Word 7. This completes the generation of the lower 128 bits 
of the result. 
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The generation of the upper 128 bits of the result for the 256-bit form of the instruction is performed 
in like manner using separately selected groups of bytes from the upper half of the 256-bit operands, 
as described above. 

The following is a more formal description of the operation of the (V)MPSADBW instruction: 


For both the 128-bit and 256-bit form of the instruction, the following set of operations is perfonned: 

srcl and src2 are byte vectors that overlay the first and second source operand respectively. 

dest is a word vector that overlays the destination register. 

tmplf ] is an array of 4-element vectors derived from the first source operand. 

tmp2 and tmp3 are 4-element vectors derived from the second source operand. 

11 = imm8[2] * 4 
j1= imm8[1:0] * 4 

tmpl [0] = {srcl [il +3], src1[i1+2], src1[i1+1], srcl [il ]} 
tmpl [1 ] = {srcl [il +4], srcl [il +3], srcl [il +2], srcl [il + 1 ]} 
tmpl [2] = {srcl [il +5], srcl [il +4], src1[i1+3], srcl [il +2]} 
tmpl [3] = {srcl [il +6], srcl [il +5], srcl [il +4], srcl [il +3]} 
tmpl [4] = {srcl [il +7], srcl [il +6], srcl [il +5], srcl [il +4]} 
tmpl[5] = {srcl[i 1 +8], src1[i1+7], srcl [i 1+6], srcl [il+5]} 
tmpl [6] = {srcl [il +9], srcl [il+8], srcl [i 1+7], srcl [il +6]} 
tmpl [7] = {srcl [il +10], srcl [il +9], srcl [il +8], srcl [il +7]} 
tmp2 = {src2[j1+3], src2[j1+2], src2[j1+1], src2[j1]} 

dest[0] = SumAbsDiff(tmp1[0], tmp2) 
destfl] = SumAbsDiff(tmp1[1], tmp2) 
dest[2] = SumAbsDiff(tmp1[2], tmp2) 
dest[3] = SumAbsDiff(tmp1[3], tmp2) 
dest[4] = SumAbsDiff(tmp1[4], tmp2) 
dest[5] = SumAbsDiff(tmp1[5], tmp2) 
dest[6] = SumAbsDiff(tmp1[6], tmp2) 
dest[7] = SumAbsDiff(tmp1[7], tmp2) 

Additionally, for the 256-bit form of the instruction, the following set of operations is performed: 

12 = imm8[5] *4 + 16 
j2= imm8[4:3] *4+16 

tmpl [8] = {srcl [i2+3], src1[i2+2], srcl [i2+1], src1[i2]} 
tmpl[9] = {srcl[i2+4], srcl [i2+3], srcl[i2+2], src1[i2+1]} 
tmpl[10] = {srcl[i2+5], srcl[i2+4], srcl[i2+3], srcl[i2+2]} 
tmpl[11] = {srcl[i2+6], srcl[i2+5], srcl[i2+4], srcl[i2+3]} 
tmpl[12] = {srcl[i2+7], srcl[i2+6], srcl[i2+5], srcl[i2+4]} 
tmpl[13] = {srcl[i2+8], srcl[i2+7], srcl[i2+6], srcl[i2+5]} 
tmpl[14] = {srcl[i2+9], srcl[i2+8], srcl[i2+7], srcl[i2+6]} 
tmpl[15] = {srcl[i2+10], srcl[i2+9], srcl [i2+8], srcl[i2+7]} 
tmp3 = {src2[j2+3], src2[j2+2], src2[j2+1], src2[j2]} 

dest[8] = SumAbsDiff(tmp1[8], tmp3) 
dest[9] = SumAbsDiff(tmp1[9], tmp3) 
dest[10] = SumAbsDiff(tmp1[10], tmp3) 
dest[11] = SumAbsDiff(tmp1[11], tmp3) 
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dest[12] = SumAbsDiff(tmp1[12], tmp3) 
dest[13] = SumAbsDiff(tmp1[13], tmp3) 
dest[14] = SumAbsDiff(tmp1[14], tmp3) 
dest[15] = SumAbsDiff(tmp1[15], tmp3) 

srcl[il+10:il+7] srcl[il+9:il+6] srcl[il+8:il+5] srcl[il+7:il+4] srcl[il+6:il+3] srcl[il+5:il+2] srcl[il+4:il+l] 


srcl[il+3:il] 



Destination XMM Register (lower half of YMM Register) 


srcl[i2+10:i2+7] srcl[i2+9:i2+6] srcl[i2+8:i2+5] srcl[i2+7:i2+4] srcl[i2+6:i2+3] srcl[i2+5:i2+2] srcl[i2+4:i2+l] 


srcl[i2+3:i2] 



Destination YMM Register (upper half) 


Notes: 

• il is a byte offset into source operand 1 (il = imm8[2] * 4). 

• jl is a byte offset into source operand 2 (jl = imm8[1:0] * 4) 

• i2 is a second byte offset into source operand 1 (i2 = imm8[5] *4+16) 

• j2 is a second byte offset into source operand 2 (j2 = imm8[4:3] *4+16) 

• £ |A| represents the sum of absolute differences function which operates on two 
4-element unsigned packed byte values and produces an unsigned 16-bit integer. 


MPSADBW_instruct2.eps 


Figure 2-2. (V)MPSADBW Instruction 

There are legacy and extended forms of the instruction: 

MPSADBW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 
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VMPSADBW 

The extended form of the instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. Bits [127:0] of the destination 
receive the results of the first 8 sums of absolute differences calculation using the selected bytes of the 
lower halves of the two source operands. Bits [255:128] of the destination receive the results of the 
second 8 sums of absolute differences calculation using selected bytes of the upper halves of the two 
source operands. 

Instruction Support 


Form 

Subset 

Feature Flag 

MPSADBW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VMPSADBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VMPSADBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 

MPSADBW xmml, xmm2/mem128, imm8 66 

Mnemonic 

VMPSADBW xmml, xmm2, xmm3lmem128, imm8 
VMPSADBW ymml, ymm2, ymm3lmem256, imm8 


Opcode 

OF 3A 42 /r ib 


VEX RXB 

C4 

C4 


Description 

Sums absolute difference of groups of 
four 8-bit integer in xmml and xmm2 
or mem128. Writes results to xmml. 

Encoding 

.mapselect W.vvvv.L.pp Opcode 

RXB.03 X.srcT.O.OI 42/r ib 

RXB.03 X.srcf.1.01 42/r ib 


Related Instructions 

(V)PSADBW, (V)PABSB, (V)PABSD, (V)PABSW 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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MULPD Multiply 

VMULPD Packed Double-Precision Floating-Point 

Multiplies each packed double-precision floating-point value of the first source operand by the corre¬ 
sponding packed double-precision floating-point value of the second source operand and writes the 
product of each multiplication into the corresponding quadword of the destination. 

There are legacy and extended forms of the instruction: 

MULPD 

Multiplies two double-precision floating-point values in the first source XMM register by the corre¬ 
sponding double precision floating-point values in either a second XMM register or a 128-bit mem¬ 
ory location. The first source register is also the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are not affected. 

VMULPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Multiplies two double-precision floating-point values in the first source XMM register by the corre¬ 
sponding double-precision floating-point values in either a second source XMM register or a 128-bit 
memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

YMM Encoding 

Multiplies four double-precision floating-point values in the first source YMM register by the corre¬ 
sponding double precision floating-point values in either a second source YMM register or a 256-bit 
memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MULPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMULPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

MULPD xmml, xmm2/mem128 66 OF 59 /r Multiplies two packed double-precision floating¬ 

point values in xmml by corresponding values in 
xmm2 or mem128. Writes results to xmml. 

Mnemonic Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

VMULPD xmml, xmm2, xmm3/mem128 C4 RXB.01 X.src.0.01 59/r 

VMULPD ymml, ymm2, ymm3/mem256 C4 RXB.01 X.src.1.01 59 /r 
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Related Instructions 

(V)MULPS, (V)MULSD, (V)MULSS 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MULPS Multiply 

VMULPS Packed Single-Precision Floating-Point 

Multiplies each packed single-precision floating-point value of the first source operand by the corre¬ 
sponding packed single-precision floating-point value of the second source operand and writes the 
product of each multiplication into the corresponding elements of the destination. 

There are legacy and extended forms of the instruction: 

MULPS 

Multiplies four single-precision floating-point values in the first source XMM register by the corre¬ 
sponding single-precision floating-point values of either a second source XMM register or a 128-bit 
memory location. The first source register is also the destination. Bits [255:128] of the YMM register 
that corresponds to the destination are not affected. 

VMULPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Multiplies four single-precision floating-point values in the first source XMM register by the corre¬ 
sponding single-precision floating-point values of either a second source XMM register or a 128-bit 
memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

YMM Encoding 

Multiplies eight single-precision floating-point values in the first source YMM register by the corre¬ 
sponding single-precision floating-point values of either a second source YMM register or a 256-bit 
memory location. Writes the results to a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

MULPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMULPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MULPS xmml, xmm2/mem128 OF 59 /r Multiplies four packed single-precision floating-point values 

in xmml by corresponding values in xmm2 or mem128. 
Writes the products to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VMULPS xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srcl.0.00 59/r 

VMULPS ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcl. 1.00 59/r 


Instruction Reference 


MULPS, VMULPS 
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Related Instructions 

(V)MULPD, (V)MULSD, (V)MULSS 

MXCSR Flags Affected_ 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MULSD Multiply 

VMULSD Scalar Double-Precision Floating-Point 

Multiplies the double-precision floating-point value in the low-order quadword of the first source 
operand by the double-precision floating-point value in the low-order quadword of the second source 
operand and writes the product into the low-order quadword of the destination. 

There are legacy and extended forms of the instruction: 

MULSD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] 
of the destination and bits [255:128] of the corresponding YMM register are not affected. 

VMULSD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first 
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MULSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VMULSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

MULSD xmml, xmm2/mem64 F2 OF 59 /r Multiplies low-order double-precision floating-point values 

in xmml by corresponding values in xmm2 or mem64. 
Writes the products to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VMULSD xmml, xmm2, xmm3/mem64 C4 RXB.01 X.srclX.11 59/r 


Related Instructions 

(V)MULPD, (V)MULPS, (V)MULSS 


Instruction Reference 


MULSD, VMULSD 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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MULSS Multiply Scalar Single-Precision Floating-Point 

VMULSS 

Multiplies the single-precision floating-point value in the low-order doubleword of the first source 
operand by the single-precision floating-point value in the low-order doubleword of the second 
source operand and writes the product into the low-order doubleword of the destination. 

There are legacy and extended forms of the instruction: 

MULSS 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the 
destination register and bits [255:128] of the corresponding YMM register are not affected. 

VMULSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first 
source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM regis¬ 
ter that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

MULSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VMULSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to 
dix E of Volume 3. 

obtain processor feature support infonnation, see Appen- 

Instruction Encoding 




Mnemonic 

Opcode 

Description 


MULSS xmml, xmm2/mem32 

F3 OF 59 It 

Multiplies a single-precision floating-point value in the low- 
order doubleword of xmml by a corresponding value in 
xmm2 or mem32. Writes the product to xmml. 

Mnemonic 


Encoding 




VEX RXB.mapselect W.vvvv.L.pp 

Opcode 

VMULSS xmml, xmm2, xmm3lmem32 

C4 RXB.01 X.srcl.XAO 

59 /r 


Related Instructions 

(V)MULPD, (V)MULPS, (V)MULSD 


Instruction Reference 


MULSS, VMULSS 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ORPD OR 

VORPD Packed Double-Precision Floating-Point 

Performs bitwise OR of two packed double-precision floating-point values in the first source operand 
with the corresponding two packed double-precision floating-point values in the second source oper¬ 
and and writes the results into the corresponding elements of the destination. 

There are legacy and extended forms of the instruction: 

ORPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VORPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ORPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VORPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ORPD xmml, xmm2/mem128 66 OF 56 /r Performs bitwise OR of two packed double-precision 

floating-point values in xmml with corresponding values in 
xmm2 or mem128. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VORPD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

56 /r 

VORPD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

56 /r 


Related Instructions 

(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPS, (V)XORPD, (V)XORPS 


Instruction Reference 


ORPD, VORPD 
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MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ORPS OR 

VORPS Packed Single-Precision Floating-Point 

Performs bitwise OR of the four packed single-precision floating-point values in the first source oper¬ 
and with the corresponding four packed single-precision floating-point values in the second source 
operand, and writes the result into the corresponding elements of the destination. 

There are legacy and extended fonns of the instruction: 

ORPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VORPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ORPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VORPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ORPS xmml, xmm2/mem128 OF 56 /r Performs bitwise OR of four packed double-precision floating¬ 

point values in xmml with corresponding values in xmm2 or 
mem128. Writes the result to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VORPS xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srcl.0.00 56/r 

VORPS ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcl. 1.00 56/r 

Related Instructions 

(V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)XORPD, (V)XORPS 


Instruction Reference 


ORPS, VORPS 
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MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PABSB Packed Absolute Value 

VPABSB Signed Byte 

Computes the absolute value of 16 or 32 packed 8-bit signed integers in the source operand. Each 
byte of the destination receives an unsigned 8-bit integer that is the absolute value of the signed 8-bit 
integer in the corresponding byte of the source operand. 

There are legacy and extended fonns of the instruction: 

PABSB 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPABSB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM reg¬ 
ister. All 32 bytes of the destination are written. 


Instruction Support 


Form 

Subset 

Feature Flag 

PABSB 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPABSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPABSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PABSB xmml, xmm2/mem128 OF 38 1C /r Computes the absolute value of each packed 8-bit signed 

integer value in xmm2/mem128 and writes the 8-bit unsigned 
results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPABSB xmml, xmm2/mem128 

C4 

RXB.02 

X.1111.0.01 

1C /r 

VPABSB ymml, ymm2/mem256 

C4 

RXB.02 

X.1111.1.01 

1C /r 


Related Instructions 

(V)PABSW, (V)PABSD 


Instruction Reference 


PABSB, VPABSB 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PABSD Packed Absolute Value 

VPABSD Signed Doubleword 

Computes the absolute value of four or eight packed 32-bit signed integers in the source operand. 
Each doubleword of the destination receives an unsigned 32-bit integer that is the absolute value of 
the signed 32-bit integer in the corresponding doubleword of the source operand. 

There are legacy and extended forms of the instruction: 

PABSD 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPABSD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM reg¬ 
ister. All four doublewords of the destination are written. 


Instruction Support 


Form 

Subset 

Feature Flag 

PABSD 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPABSD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPABSD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PABSD xmml, xmm2/mem128 OF 38 1E /r Computes the absolute value of each packed 32-bit signed 

integer value in xmm2/mem128 and writes the 32-bit 
unsigned results to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPABSD xmml, xmm2/mem128 

C4 

RXB.02 

X.1111.0.01 

IE /r 

VPABSD ymml, ymm2/mem256 

C4 

RXB.02 

X.1111.1.01 

IE /r 


Related Instructions 

(V)PABSB, (V)PABSW 


Instruction Reference 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PABSW Packed Absolute Value 

VPABSW Signed Word 

Computes the absolute value of eight or sixteen packed 16-bit signed integers in the source operand. 
Each word of the destination receives an unsigned 16-bit integer that is the absolute value of the 
signed 16-bit integer in the corresponding word of the source operand. 

There are legacy and extended forms of the instruction: 

PABSW 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPABSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM 
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM reg¬ 
ister. All 16 words of the destination are written. 


Instruction Support 


Form 

Subset 

Feature Flag 

PABSW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPABSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPABSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PABSW xmml, xmm2/mem128 OF 38 ID /r Computes the absolute value of each packed 16-bit signed 

integer value in xmm2/mem128 and writes the 16-bit 
unsigned results to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPABSW xmml, xmm2/mem128 

C4 

RXB.02 

X.1111.0.01 

ID /r 

VPABSW ymml, ymm2/mem256 

C4 

RXB.02 

X.1111.1.01 

ID/r 


Related Instructions 

(V)PABSB, (V)PABSD 


Instruction Reference 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PACKSSDW Pack with Signed Saturation 

VPACKSSDW Doubleword to Word 

Converts four or eight 32-bit signed integers from the first source operand and the second source 
operand into 16-bit signed integers and packs the results into the destination. 

Positive source value greater than 7FFFh are saturated to 7FFFh; negative source values less than 
8000h are saturated to 8000h. 

Converted values from the first source operand are packed into the low-order words of the destina¬ 
tion; converted values from the second source operand are packed into the high-order words of the 
destination. 

There are legacy and extended forms of the instruction: 

PACKSSDW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPACKSSDW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PACKSSDW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPACKSSDW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPACKSSDW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PACKSSDW xmml, xmm2/mem128 66 OF 6B /r Converts 32-bit signed integers in xmml and xmm2 

or mem128 into 16-bit signed integers with 
saturation. Writes packed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPACKSSDW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

6B/r 

VPACKSSDW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

6B/r 
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Related Instructions 

(V)PACKSSWB, (V)PACKUSDW, (V)PACKUSWB 

MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PACKSSWB Pack with Signed Saturation 

VPACKSSWB Word to Byte 

Converts eight or sixteen 16-bit signed integers from the first source operand and the second source 
operand into sixteen or thirty two 8-bit signed integers and packs the results into the destination. 

Positive source values greater than 7Fh are saturated to 7Fh; negative source values less than 80h are 
saturated to 8Oh. 

Converted values from the first source operand are packed into the low-order bytes of the destination; 
converted values from the second source operand are packed into the high-order bytes of the destina¬ 
tion. 

There are legacy and extended forms of the instruction: 

PACKSSWB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPACKSSWB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PACKSSWB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPACKSSWB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPACKSSWB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PACKSSWB xmml, xmm2/mem128 66 OF 63 /r Converts 16-bit signed integers in xmml and xmm2 

or mem128 into 8-bit signed integers with saturation. 
Writes packed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPACKSSWB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srct.0.01 

63 It 

VPACKSSWB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

63 It 
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Related Instructions 

(V)PACKSSDW, (V)PACKUSDW, (V)PACKUSWB 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PACKUSDW Pack with Unsigned Saturation 

VPACKUSDW Doubleword to Word 

Converts four or eight 32-bit signed integers from the first source operand and the second source 
operand into eight or sixteen 16-bit unsigned integers and packs the results into the destination. 

Source values greater than FFFFh are saturated to FFFFh; source values less than OOOOh are saturated 
to OOOOh. 

Packs converted values from the first source operand into the low-order words of the destination; 
packs converted values from the second source operand into the high-order words of the destination. 

There are legacy and extended forms of the instruction: 

PACKUSDW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPACKUSDW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PACKUSDW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPACKUSDW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPACKUSDW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PACKUSDW xmml, xmm2/mem128 66 OF 38 2B /r Converts 32-bit signed integers in xmml and xmm2 

or mem128 into 16-bit unsigned integers with 
saturation. Writes packed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPACKUSDW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

2B/r 

VPACKUSDW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 0.01 

2B/r 
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Related Instructions 

(V)PACKSSDW, (V)PACKSSWB, (V)PACKUSWB 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PACKUSWB Pack with Unsigned Saturation 

VPACKUSWB Word to Byte 

Converts eight or sixteen 16-bit signed integers from the first source operand and the second source 
operand into sixteen or thirty two 8-bit unsigned integers and packs the results into the destination. 

When a source value is greater than 7Fh it is saturated to FFh; when source value is less than OOh, it is 
saturated to OOh. 

Packs converted values from the first source operand into the low-order bytes of the destination; 
packs converted values from the second source operand into the high-order bytes of the destination. 

There are legacy and extended forms of the instruction: 

PACKUSWB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPACKUSWB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PACKUSWB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPACKUSWB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPACKUSWB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PACKUSWB xmml, xmm2/mem128 66 OF 67 /r Converts 16-bit signed integers in xmml and xmm2 

or mem128 into 8-bit signed integers with saturation. 
Writes packed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPACKUSWB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

67 It 

VPACKUSWB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

67 It 
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Related Instructions 

(V)PACKSSDW, (V)PACKSSWB, (V)PACKUSDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDB Packed Add 

VPADDB Bytes 

Adds 16 or 32 packed 8-bit integer values in the first source operand to corresponding values in the 
second source operand and writes the integer sums to the corresponding bytes of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written to the destination. 

There are legacy and extended fonns of the instruction: 

PADDB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PADDB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDB xmml, xmm2/mem128 66 OF FC /r Adds packed byte integer values in xmml and xmm2 or 

mem 128 Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPADDB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

FC/r 

VPADDB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

FC/r 
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Related Instructions 

(V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDD Packed Add 

VPADDD Doublewords 


Adds 4 or 8 packed 32-bit integer value in the first source operand to corresponding values in the sec¬ 
ond source operand and writes integer sums to the corresponding doublewords of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each 
result are written to the destination. 

There are legacy and extended fonns of the instruction: 

PADDD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form Subset Feature Flag 


PADDD SSE2 CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) 
VPADDD 256-bit AVX2 CPUID Fn0000_0007JEBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDD xmml, xmm2/mem128 66 OF FE /r Adds packed doubleword integer values in xmml and 

xmm2 or mem128 Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPADDD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

FE /r 

VPADDD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

FE /r 
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Related Instructions 

(V)PADDB, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDQ Packed Add 

VPADDQ Quadwords 

Adds 2 or 4 packed 64-bit integer values in the first source operand to corresponding values in the 
second source operand and writes the integer sums to the corresponding quadwords of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each 
result are written to the destination. 

There are legacy and extended fonns of the instruction: 

PADDQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PADDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDQ xmml, xmm2/mem128 66 OF D4 /r Adds packed quadword integer values in xmml and 

xmm2 or mem 128 Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPADDQ xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.srcl. 0.01 

D4 /r 

VPADDQ ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.srcl. 1.01 

D4 /r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDSB Packed Add with Signed Saturation 

VPADDSB Bytes 

Adds 16 or 32 packed 8-bit signed integer values in the first source operand to the corresponding val¬ 
ues in the second source operand and writes the signed integer sums to corresponding bytes of the 
destination. 

Positive sums greater than 7Fh are saturated to 7Fh; negative sums less than 80h are saturated to 80h. 
There are legacy and extended fonns of the instruction: 

PADDSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDSB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PADDSB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDSB xmml, xmm2/mem128 66 OF EC /r Adds packed signed 8-bit integer values in xmml and 

xmm2 or mem128 with signed saturation. Writes the 
sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPADDSB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

EC/r 

VPADDSB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

EC/r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDSW Packed Add with Signed Saturation 

VPADDSW Words 

Adds 8 or 16 packed 16-bit signed integer value in the first source operand to the corresponding val¬ 
ues in the second source operand and writes the signed integer sums to the corresponding words of 
the destination. 

Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated 
to 8000h. 

There are legacy and extended forms of the instruction: 

PADDSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDSW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PADDSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDSW xmml, xmm2/mem128 66 OF ED /r Adds packed signed 16-bit integer values in xmml and 

xmm2 or mem128 with signed saturation. Writes the 
sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPADDSW xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.srcl. 0.01 

ED /r 

VPADDSW ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.srcl. 1.01 

ED /r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDUSB, (V)PADDUSW, (V)PADDW 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDUSB Packed Add with Unsigned Saturation 

VPADDUSB Bytes 

Adds 16 or 32 packed 8-bit unsigned integer values in the first source operand to the corresponding 
values in the second source operand and writes the unsigned integer sums to the corresponding bytes 
of the destination. 

Sums greater than FFh are saturated to FFh. 

There are legacy and extended fonns of the instruction: 

PADDUSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDUSB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PADDUSB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDUSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDUSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDUSB xmml, xmm2/mem128 66 OF DC /r Adds packed unsigned 8-bit integer values in xmml 

and xmm2 or mem128 with unsigned saturation. 
Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPADDUSB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

DC/r 

VPADDUSB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

DC/r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSW, (V)PADDW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDUSW Packed Add with Unsigned Saturation 

VPADDUSW Words 


Adds 8 or 16 packed 16-bit unsigned integer value in the first source operand to the corresponding 
values in the second source operand and writes the unsigned integer sums to the corresponding words 
of the destination. 

Sums greater than FFFFh are saturated to FFFFh. 

There are legacy and extended fonns of the instruction: 

PADDUSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDUSW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PADDUSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDUSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDUSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDUSW xmml, xmm2/mem128 66 OF DD /r Adds packed unsigned 16-bit integer values in xmml 

and xmm2 or mem128 with unsigned saturation. 
Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPADDUSW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

DD/r 

VPADDUSW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

DD /r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PADDW Packed Add 

VPADDW Words 

Adds or 16 packed 16-bit integer value in the first source operand to the corresponding values in the 
second source operand and writes the integer sums to the corresponding word of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of each 
result are written to the destination. 

There are legacy and extended fonns of the instruction: 

PADDW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPADDW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PADDW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPADDW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPADDW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PADDW xmml, xmm2/mem128 66 OF FD /r Adds packed 16-bit integer values in xmml and xmm2 

or mem128. Writes the sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPADDW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

FD /r 

VPADDW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

FD /r 
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Related Instructions 

(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW 

RFIags Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PALIGNR Packed Align Right 

VPALIGNR 

Concatenates one or two pairs of 16-byte values from the first and second source operands and right- 
shifts the concatenated values the number of bytes specified by the unsigned immediate operand. 
Writes the least-significant 16 bytes of the shifted result to the destination or writes the least-signifi¬ 
cant 16 bytes of the two shifted results to the upper and lower halves of the destination. 

For the 128-bit fonn of the instruction, the first and second 128-bit source operands are concatenated 
to form a temporary 256-bit value with the first source operand occupying the most-significant half of 
the temporary value. After the right-shift operation, the lower 128 bits of the result are written to the 
destination. 

For the 256-bit form of the instruction, the lower 16 bytes of the first and second source operands are 
concatenated to form a first temporary 256-bit value with the bytes from the first source operand 
occupying the most-significant half of the temporary value. The upper 16 bytes of the first and second 
source operands are concatenated to fonn a second temporary 256-bit value with the bytes from the 
first source operand occupying the most-significant half of the second temporary value. Both tempo¬ 
rary values are right-shifted the number of bytes specified by the immediate operand. After the right- 
shift operation, the lower 16 bytes of the first temporary value are written to the lower 128 bits of the 
destination and the lower 16 bytes of the second temporary value are written to the upper 128 bits of 
the destination. 

The binary value of the immediate operand determines the byte shift value. On each shift the most- 
significant byte is set to zero. When the byte shift value is greater than 31, the destination is zeroed. 

There are two forms of the instruction. 

PALIGNR 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPALIGNR 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PALIGNR 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPALIGNR 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPALIGNR 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

PALIGNR xmml, xmm2lmem128, imm8 66 OF 3A OF /r ib Right-shifts xmml:xmm2/mem128 imm8 

bytes. Writes shifted result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPALIGNR xmml, xmm2, xmm3/mem128, imm8 

C4 

RXB.03 

X.srcl. 0.01 

OF/r ib 

VPALIGNR ymml, ymm2, ymm3/mem256, imm8 

C4 

RXB.03 

X.srcl. 1.01 

OF/r ib 


Related Instructions 

None 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PAND Packed AND 

VPAND 

Performs a bitwise AND of the packed values in the first and second source operands and writes the 
result to the destination. 

There are legacy and extended forms of the instruction: 

PAND 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPAND 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PAND 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPAND 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPAND 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PAND xmml, xmm2/mem128 66 OF DB /r Performs bitwise AND of values in xmml and xmm2 or 

mem128. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPAND xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

DB/r 

VPAND ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)PANDN, (V)POR, (V)PXOR 

C4 

RXB.01 

X.srcl. 1.01 

DB/r 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PANDN Packed AND NOT 

VPANDN 

Generates the ones’ complement of the value in the first source operand and performs a bitwise AND 
of the complement and the value in the second source operand. Writes the result to the destination. 

There are legacy and extended forms of the instruction: 

PANDN 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPANDN 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PANDN 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPANDN 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPANDN 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PANDN xmml, xmm2/mem128 66 OF DF /r Generates ones’ complement of xmml, then performs 

bitwise AND with value in xmm2 or mem128. Writes the 
result to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPANDN xmml, xmm2, xmm3/mem128 C4 RXB.01 X.src.0.01 DF/r 

VPANDN ymml, ymm2, ymm3/mem256 C4 RXB.01 X.src.1.01 DF/r 

Related Instructions 

(V)PAND, (V)POR, (V)PXOR 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PAVGB Packed Average 

VPAVGB Unsigned Bytes 

Computes the rounded averages of 16 or 32 packed unsigned 8-bit integer values in the first source 
operand and the corresponding values of the second source operand. Writes each average to the corre¬ 
sponding byte of the destination. 

An average is computed by adding pairs of 8-bit integer values in corresponding positions in the two 
operands, adding 1 to a 9-bit temporary sum, and right-shifting the temporary sum by one bit position. 

There are legacy and extended fonns of the instruction: 

PAVGB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPAVGB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PAVGB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPAVGB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPAVGB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

PAVGB xmml, xmm2/mem128 66 OF EO /r 

Mnemonic 

VPAVGB xmml, xmm2, xmm3/mem128 
VPAVGB ymml, ymm2, ymm3/mem256 


Description 

Averages pairs of packed 8-bit unsigned integer values 
in xmml and xmm2 or mem128. Writes the averages to 
xmml. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.01 X.srcf.O.OI EO/r 

C4 RXB.01 X.srcY.1.01 EO/r 
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Related Instructions 

PAVGW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PAVGW Packed Average 

VPAVGW Unsigned Words 

Computes the rounded average of packed unsigned 16-bit integer values in the first source operand 
and the corresponding values of the second source operand. Writes each average to the corresponding 
word of the destination. 

An average is computed by adding pairs of 16-bit integer values in corresponding positions in the two 
operands, adding 1 to a 17-bit temporary sum, and right-shifting the temporary sum by one bit posi¬ 
tion. 

There are legacy and extended fonns of the instruction: 

PAVGW 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The destination is the same XMM register as the first source operand; the 
upper 128-bits of the corresponding YMM register are not affected. 

VPAVGW 

The extended form of the instruction hasl28-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PAVGW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPAVGW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPAVGW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

PAVGW xmml, xmm2/mem128 66 OF E3 /r 

Mnemonic 

VPAVGW xmml, xmm2, xmm3/mem128 
VPAVGW ymml, ymm2, ymm3/mem256 


Description 

Averages pairs of packed 16-bit unsigned integer values 
in xmml and xmm2 or mem128. Writes the averages to 
xmml. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl. 0.01 

E3/r 

C4 

RXB.01 

X.srcl. 1.01 

E3/r 
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Related Instructions 

(V)PAVGB 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PBLENDVB Variable Blend 

VPBLENDVB Packed Bytes 

Copies packed bytes from either of two sources to a destination, as specified by a mask operand. 

The mask is defined by the most significant bit of each byte of the mask operand. The position of a 
mask bit corresponds to the position of the most significant bit of a copied value. 

• When a mask bit = 0, the specified element of the first source is copied to the corresponding 
position in the destination. 

• When a mask bit = 1, the specified element of the second source is copied to the corresponding 
position in the destination. 

There are legacy and extended forms of the instruction: 

PBLENDVB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. The mask operand is the implicit 
register XMMO. 

VPBLENDVB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. The mask operand is a fourth XMM register 
selected by bits [7:4] of an immediate byte. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. The mask operand is a fourth 
YMM register selected by bits [7:4] of an immediate byte. 

Instruction Support 


Form 

Subset 

Feature Flag 

PBLENDVB 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41 ] (bit 19) 

VPBLENDVB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPBLENDVB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode 

PBLENDVB xmml, xmm2/mem128 66 OF 38 10 /r 

Mnemonic 

VPBLENDVB xmml, xmm2, xmm3/mem128, xmm4 
VPBLENDVB ymml, ymm2, ymm3/mem256, ymm4 

Related Instructions 

(V)BLENDVPD, (V)BLENDVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


Description 

Selects byte values from xmml or xmm2/mem128, 
depending on the value of corresponding mask bits 
in XMMO. Writes the selected values to xmml. 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.03 O.srcf.O.OI 4C /r is4 

C4 RXB.03 O.srcf.I.OI 4C/r is4 
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PBLENDW Blend 

VPBLENDW Packed Words 

Copies packed words from either of two sources to a destination, as specified by an immediate 8-bit 
mask operand. For the 256-bit form, the same 8-bit mask is applied twice; once to select words to be 
written to the lower 128 bits of the destination and again to select words to be written to the upper 128 
bits of the destination. 

Each bit of the mask selects a word from one of the source operands based on the position of the word 
within the operand. Bit 0 of the mask selects the least-significant word (word 0) to be copied, bit 1 
selects the next-most significant word (word 1), and so forth. Bit 7 selects word 7 (the most-signifi¬ 
cant word for 128-bit operands). 

For the 256-bit operands, the mask is reused to select words in the upper 128-bits of the source oper¬ 
ands to be copied. Bit 0 of the mask selects word 8, bit 1 selects word 9, and so forth. Finally, bit 7 of 
the mask selects the word from position 15. 

• When a mask bit = 0, the specified element of the first source is copied to the corresponding 
position in the destination. 

• When a mask bit = 1, the specified element of the second source is copied to the corresponding 
position in the destination. 

There are legacy and extended forms of the instruction: 

PBLENDW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPBLENDW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

instruction Support 


Form 

Subset 

Feature Flag 

PBLENDW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPBLENDW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPBLENDW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PBLENDW xmml, xmm2/mem128, imm8 66 OF 3A OE /r ib Selects word values from xmml or 

xmm2/mem128, as specified by imm8. 
Writes the selected values to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPBLENDW xmml, xmm2, xmm3/mem128, imm8 

C4 

RXB.03 

X.srcl. 0.01 

OE/r/ib 

VPBLENDW ymml, ymm2, ymm3/mem256, imm8 

C4 

RXB.03 

X.srcf.1.01 

OE/r/ib 


Related Instructions 

(V)BLENDPD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


296 


PBLENDW, VPBLENDW 


Instruction Reference 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


PCLMULQDQ Carry-less Multiply 

VPCLMULQDQ Quadwords 

Performs a carry-less multiplication of a selected quadword element of the first source operand by a 
selected quadword element of the second source operand and writes the product to the destination. 

Carry-less multiplication, also known as binary polynomial multiplication, is the mathematical opera¬ 
tion of computing the product of two operands without generating or propagating carries. It is an 
essential component of cryptographic processing, and typically requires a large number of cycles. 

The instruction provides an efficient means of perfonning the operation and is particularly useful in 
implementing the Galois counter mode used in the Advanced Encryption Standard (AES). See 
Appendix A on page 973 for additional information. 

Bits 4 and 0 of an 8-bit immediate byte operand specify which quadword of each source operand to 
multiply, as follows. 


Mnemonic 

lmm[0] 

lmm[4] 

Quadword Operands Selected 

(V)PCLMULLQLQDQ 

0 

0 

SRC1[63:0], SRC2[63:0] 

(V)PCLMULHQLQDQ 

1 

0 

SRC1 [127:64], SRC2[63:0] 

(V)PCLMULLQHQDQ 

0 

1 

SRC1[63:0], SRC2[127:64] 

(V)PCLMULHQHQDQ 

1 

1 

SRC1 [127:64], SRC2[127:64] 


Alias mnemonics are provided for the various immediate byte combinations. 

There are legacy and extended fonns of the instruction: 

PCLMULQDQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCLMULQDQ 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCLMULQDQ 

PCLMULQDQ 

CPUID Fn0000_0001_ECX[PCLMULQDQ] (bit 1) 

VPCLMULQDQ 

AVXor 

PCLMULQDQ 

CPUID FnOOOO 0001 ECX[PCLMULQDQ] (bit 1) or 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 


Opcode 


Description 


PCLMULQDQ xmml, xmm2/mem128, imm8 66 OF 3A 44 /r ib Performs carry-less multiplication of a 

selected quadword element of xmml by a 
selected quadword element of xmm2 or 
mem128. Elements are selected by bits 4 
and 0 of imm8. Writes the product to xmml. 


Mnemonic 

VPCLMULQDQ xmml, xmm2, xmm3/mem128, imm8 


Encoding 

VEX RXB.map_select W.vvvv.L.pp 

C4 RXB.00011 X.src.0.01 


Opcode 

44 /r ib 


Related Instructions 

(V)PMULDQ, (V)PMULUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PCMPEQB Packed Compare Equal 

VPCMPEQB Bytes 

Compares packed byte values in the first source operand to corresponding values in the second source 
operand and writes a comparison result to the corresponding byte of the destination. 

When values are equal, the result is FFh; when values are not equal, the result is OOh. 

There are legacy and extended forms of the instruction: 

PCMPEQB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPEQB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPEQB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPCMPEQB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPEQB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPEQB xmml, xmm2/mem128 66 OF 74 /r Compares packed bytes in xmml to packed bytes in 

xmm2 or mem 128. Writes results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMPEQB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl.0.01 

74 Ir 

VPCMPEQB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

74 Ir 


Related Instructions 

(V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPEQD 

VPCMPEQD 


Packed Compare Equal 
Doublewords 


Compares packed doubleword values in the first source operand to corresponding values in the sec¬ 
ond source operand and writes a comparison result to the corresponding doubleword of the destina¬ 
tion. 

When values are equal, the result is FFFFFFFFh; when values are not equal, the result is OOOOOOOOh. 
There are legacy and extended fonns of the instruction: 

PCMPEQD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPEQD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


PCMPEQD 
VPCMPEQD 128-bit 
VPCMPEQD 256-bit 


Subset Feature Flag 


SSE2 CPUID Fn0000_0001_EDX[SSE2] (bit 26) 
AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) 
AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPEQD xmml, xmm2/mem128 66 OF 76 /r Compares packed doublewords in xmml to packed 

doublewords in xmm2 or mem128. Writes results to 
xmml. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPCMPEQD xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srclO.OI 76/r 

VPCMPEQD ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcll.01 76/r 

Related Instructions 

(V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPEQQ Packed Compare Equal 

VPCMPEQQ Quadwords 

Compares packed quadword values in the first source operand to corresponding values in the second 
source operand and writes a comparison result to the corresponding quadword of the destination. 

When values are equal, the result is FFFFFFFFFFFFFFFFh; when values are not equal, the result is 
OOOOOOOOOOOOOOOOh. 

There are legacy and extended fonns of the instruction: 

PCMPEQQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPEQQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPEQQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41 ] (bit 19) 

VPCMPEQQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPEQQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPEQQ xmml, xmm2/mem128 66 OF 38 29 /r Compares packed quadwords in xmml to packed 

quadwords in xmm2 or mem128. Writes results to 
xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPCMPEQQ xmml, xmm2, xmm3/mem128 C4 RXB.02 X.srclO.OI 29/r 

VPCMPEQQ ymml, ymm2, ymm3/mem256 C4 RXB.02 X.srcl. 1.01 29/r 
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Related Instructions 

(V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


304 


PCMPEQQ, VPCMPEQQ 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


PCMPEQW Packed Compare Equal 

VPCMPEQW Words 

Compares packed word values in the first source operand to corresponding values in the second 
source operand and writes a comparison result to the corresponding word of the destination. 

When values are equal, the result is FFFFh; when values are not equal, the result is OOOOh. 

There are legacy and extended forms of the instruction: 

PCMPEQW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPEQW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPEQW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPCMPEQW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPEQW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPEQW xmml, xmm2/mem128 66 OF 75 /r Compares packed words in xmml to packed words in 

xmm2 or mem128. Writes results to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPCMPEQW xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srcl. 0.01 75/r 

VPCMPEQW ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcll .01 75/r 

Related Instructions 

(V)PCMPEQB, (V)PCMPEQD, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPESTRI Packed Compare 

VPCMPESTRI Explicit Length Strings Return Index 

Compares character string data in the first and second source operands. Comparison operations are 
carried out as specified by values encoded in the immediate operand. Writes an index to the ECX reg¬ 
ister. 

Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. 
Characters may be treated as either signed or unsigned values. Each operand has associated with it a 
separate integer value specifying the length of the string. 

The absolute value of the data in the EAX/RAX register represents the length of the character string 
in the first source operand; the absolute value of the data in the EDX/RDX register represents the 
length of the character string in the second source operand. 

If the absolute value of the data in either register is greater than the maximum string length that fits in 
128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters. 

The comparison operations between the two operand strings are summarized in an intermediate 
result—a comparison summary bit vector that is post-processed to produce the final output. Data 
fields within the immediate byte specify the source data format, comparison type, comparison sum¬ 
mary bit vector post-processing, and output option selection. 

The index of either the most significant or least significant set bit of the post-processed comparison 
summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary 
bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit 
character strings. 

See Section 1.5, “String Compare Instructions” for information about source string data format, com¬ 
parison operations, comparison summary bit vector generation, post-processing, and output selection 
options. 

The rFLAGS are set to indicate the following conditions: 


Flag 

Condition 

CF 

Cleared if the comparison summary bit vector is zero; otherwise set. 

PF 

cleared. 

AF 

cleared. 

ZF 

Set if the specified length of the second string is less than the maximum; otherwise 
cleared. 

SF 

Set if the specified length of the first string is less than the maximum; otherwise 
cleared. 

OF 

Equal to the value of the Isb of the post-processed comparison summary bit vector. 


There are legacy and extended fonns of the instruction: 

PCMPESTRI 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. A result index is written to the ECX register. 

VPCMPESTRI 

The extended form of the instruction has a 128-bit encoding only. 
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The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. A result index is written to the ECX register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPESTRI 

SSE4.2 

CPUID Fn0000_0001_ECX[SSE42] (bit 20) 

VPCMPESTRI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPESTRI xmml, xmm2/mem128, imm8 66 OF 3A 61 /r ib Compares packed string data in xmml and 

xmm2 or mem128. Writes a result index to 
the ECX register. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPCMPESTRI xmml, xmm2/mem128, imm8 C4 RXB.00011 X.1111.0.01 61 /r ib 


Related Instructions 

(V)PCMPESTRM, (V)PCMPISTRI, (V)PCMPISTRM 


rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









M 




M 

M 

0 

0 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PCMPESTRM Packed Compare 

VPCMPESTRM Explicit Length Strings Return Mask 

Compares character string data in the first and second source operands. Comparison operations are 
carried out as specified by values encoded in the immediate operand. Writes a mask value to the 
YMMO/XMMO register. 

Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. 
Characters may be treated as either signed or unsigned values. Each operand has associated with it a 
separate integer value specifying the length of the string. 

The absolute value of the data in the EAX/RAX register represents the length of the character string 
in the first source operand; the absolute value of the data in the EDX/RDX register represents the 
length of the character string in the second source operand. 

If the absolute value of the data in either register is greater than the maximum string length that fits in 
128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters. 

The comparison operations between the two operand strings are summarized in an intermediate 
result—a comparison summary bit vector that is post-processed to produce the final output. Data 
fields within the immediate byte specify the source data format, comparison type, comparison sum¬ 
mary bit vector post-processing, and output option selection. 

Depending on the output option selected, the post-processed comparison summary bit vector is either 
zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMMO. 

See Section 1.5, “String Compare Instructions” for information about source string data format, com¬ 
parison operations, comparison summary bit vector generation, post-processing, and output selection 
options. 

The rFLAGS are set to indicate the following conditions: 


Flag 

Condition 

CF 

Cleared if the comparison summary bit vector is zero; otherwise set. 

PF 

cleared. 

AF 

cleared. 

ZF 

Set if the specified length of the second string is less than the maximum; otherwise 
cleared. 

SF 

Set if the specified length of the first string is less than the maximum; otherwise 
cleared. 

OF 

Equal to the value of the Isb of the post-processed summary bit vector. 


There are legacy and extended fonns of the instruction: 

PCMPESTRM 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The mask result is written to the XMMO register. 

VPCMPESTRM 

The extended fonn of the instruction has a 128-bit encoding only. 
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The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The mask result is written to the XMMO register. Bits [255:128] of the 
YMMO register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPESTRM 

SSE4.2 

CPUID Fn0000_0001_ECX[SSE42] (bit 20) 

VPCMPESTRM 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPESTRMxmml, xmm2/mem128, imm8 66 OF 3A 60 /r ib Compares packed string data in xmml and 

xmm2 or mem128. Writes a mask value to 
the XMMO register. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPCMPESTRM xmml, xmm2/mem128, imm8 C4 RXB.00011 X.1111.0.01 60/r ib 


Related Instructions 

(V)PCMPESTRI, (V)PCMPISTRI, (V)PCMPISTRM 


rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









M 




M 

M 

0 

0 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PCMPGTB Packed Compare Greater Than 

VPCMPGTB Signed Bytes 

Compares packed signed byte values in the first source operand to corresponding values in the second 
source operand and writes a comparison result to the corresponding byte of the destination. 

When a value in the first operand is greater than a value in the second source operand, the result is 
FFh; when a value in the first operand is less than or equal to a value in the second operand, the result 
is OOh. 

There are legacy and extended forms of the instruction: 

PCMPGTB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPGTB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPGTB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPCMPGTB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPGTB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPGTB xmml, xmm2/mem128 66 OF 64 /r Compares packed bytes in xmml to packed bytes in 

xmm2 or mem 128. Writes results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMPGTB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

64 /r 

VPCMPGTB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

64 /r 
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Related Instructions 

(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTD, (V)PCMPGTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPGTD Packed Compare Greater Than 

VPCMPGTD Signed Doublewords 

Compares packed signed doubleword values in the first source operand to corresponding values in the 
second source operand and writes a comparison result to the corresponding doubleword of the desti¬ 
nation. 

When a value in the first operand is greater than a value in the second operand, the result is 
FFFFFFFFh; when a value in the first operand is less than or equal to a value in the second operand, 
the result is OOOOOOOOh. 

There are legacy and extended fonns of the instruction: 

PCMPGTD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPGTD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PCMPGTD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPCMPGTD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPGTD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPGTD xmml, xmm2/mem128 66 OF 66 /r Compares packed bytes in xmml to packed bytes in 

xmm2 or mem 128. Writes results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMPGTD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

66 It 

VPCMPGTD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcll.01 

66 /r 
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Related Instructions 

(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPGTQ Packed Compare Greater Than 

VPCMPGTQ Signed Quadwords 

Compares packed signed quadword values in the first source operand to corresponding values in the 
second source operand and writes a comparison result to the corresponding quadword of the destina¬ 
tion. 

When a value in the first operand is greater than a value in the second operand, the result is 
FFFFFFFFFFFFFFFFh; when a value in the first operand is less than or equal to a value in the second 
operand, the result is OOOOOOOOOOOOOOOOh. 

There are legacy and extended fonns of the instruction: 

PCMPGTQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPGTQ 

The extended form of the instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPGTQ 

SSE4.2 

CPUID Fn0000_0001_ECX[SSE42] (bit 20) 

VPCMPGTQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPGTD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPGTQ xmml , xmm2/mem128 66 OF 38 37 /r Compares packed bytes in xmml to packed bytes in 

xmm2 or mem128. Writes results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMPGTQ xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

37 /r 

VPCMPGTQ ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

37 /r 
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Related Instructions 

(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPGTW Packed Compare Greater Than Signed Words 

VPCMPGTW 

Compares packed signed word values in the first source operand to corresponding values in the sec¬ 
ond source operand and writes a comparison result to the corresponding word of the destination. 

When a value in the first operand is greater than a value in the second operand, the result is FFFFh; 
when a value in the first operand is less than or equal to a value in the second operand, the result is 
OOOOh. 

There are legacy and extended forms of the instruction: 

PCMPGTW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPCMPGTW 

The extended fonn of the instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PCMPGTW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPCMPGTW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPCMPGTW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPGTW xmml, xmm2/mem128 66 OF 65 /r Compares packed bytes in xmml to packed bytes in 

xmm2 or mem128. Writes results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMPGTW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

65 /r 

VPCMPGTW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

65 /r 
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Related Instructions 

(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PCMPISTRI Packed Compare 

VPCMPISTRI Implicit Length Strings Return Index 

Compares character string data in the first and second source operands. Comparison operations are 
carried out as specified by values encoded in the immediate operand. Writes an index to the ECX reg¬ 
ister. 

Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. 
Characters may be treated as either signed or unsigned values. 

Source operand strings shorter than the maximum that can be packed into a 128-bit value are termi¬ 
nated by a null character (value of 0). The characters prior to the null character constitute the string. If 
the first (lowest indexed) character is null, the string length is 0. 

The comparison operations between the two operand strings are summarized in an intermediate 
result—a comparison summary bit vector that is post-processed to produce the final output. Data 
fields within the immediate byte specify the source data format, comparison type, comparison sum¬ 
mary bit vector post-processing, and output option selection. 

The index of either the most significant or least significant set bit of the post-processed comparison 
summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary 
bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit 
character strings. 

See Section 1.5, “String Compare Instructions” for information about source string data format, com¬ 
parison operations, comparison summary bit vector generation, post-processing, and output selection 
options. 

The rFLAGS are set to indicate the following conditions: 


Flag 

Condition 

CF 

Cleared if the comparison summary bit vector is zero; otherwise set. 

PF 

cleared. 

AF 

cleared. 

ZF 

Set if any byte (word) in the second operand is null; otherwise cleared. 

SF 

Set if any byte (word) in the first operand is null; otherwise cleared 

OF 

Equal to the value of the Isb of the post-processed summary bit vector. 


There are legacy and extended forms of the instruction: 

PCMPISTRI 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. A result index is written to the ECX register. 

VPCMPISTRI 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. A result index is written to the ECX register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PCMPISTRI 

SSE4.2 

CPUID Fn0000_0001_ECX[SSE42] (bit 20) 

VPCMPISTRI 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPISTRI xmml, xmm2/mem128, imm8 66 OF 3A 63 /r ib Compares packed string data in xmml and 

xmm2 or mem 128. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPCMPISTRI xmml, xmm2/mem128, imm8 C4 RXB.03 X.1111.0.01 63 /rib 


Related Instructions 

(V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRM 

rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









M 




M 

M 

0 

0 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PCMPISTRM Packed Compare Implicit Length 

VPCMPISTRM Strings Return Mask 

Compares character string data in the first and second source operands. Comparison operations are 
carried out as specified by values encoded in the immediate operand. Writes a mask value to the 
YMMO/XMMO register 

Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. 
Characters may be treated as either signed or unsigned values. 

Source operand strings shorter than the maximum that can be packed into a 128-bit value are termi¬ 
nated by a null character (value of 0). The characters prior to the null character constitute the string. If 
the first (lowest indexed) character is null, the string length is 0. 

The comparison operations between the two operand strings are summarized in an intermediate 
result—a comparison summary bit vector that is post-processed to produce the final output. Data 
fields within the immediate byte specify the source data format, comparison type, comparison sum¬ 
mary bit vector post-processing, and output option selection. 

Depending on the output option selected, the post-processed comparison summary bit vector is either 
zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMMO. 

See Section 1.5, “String Compare Instructions” for information about source string data format, com¬ 
parison operations, comparison summary bit vector generation, post-processing, and output selection 
options. 

The rFLAGS are set to indicate the following conditions: 


Flag 

Condition 

CF 

Cleared if the comparison summary bit vector is zero; otherwise set. 

PF 

cleared. 

AF 

cleared. 

ZF 

Set if any byte (word) in the second operand is null; otherwise cleared. 

SF 

Set if any byte (word) in the first operand is null; otherwise cleared. 

OF 

Equal to the value of the Isb of the post-processed summary bit vector. 


There are legacy and extended forms of the instruction: 

PCMPISTRM 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The mask result is written to the XMMO register. 

VPCMPISTRM 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The mask result is written to the XMMO register. Bits [255:128] of the 
YMMO register are cleared. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PCMPISTRM 

SSE4.2 

CPUID Fn0000_0001_ECX[SSE42] (bit 20) 

VPCMPISTRM 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PCMPISTRM xmml, xmm2/mem128, imm8 66 OF 3A 62 /r ib Compares packed string data in xmml and 

xmm2 or mem128. Writes a result or mask 
to the XMMO register. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPCMPISTRM xmml, xmm2/mem128, imm8 C4 RXB.03 X.1111.0.01 62/rib 


Related Instructions 

(V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRI 


rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









M 




M 

M 

0 

0 

M 

21 

20 

19 

18 

17 

16 

14 

13 12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PEXTRB Extract 

VPEXTRB Packed Byte 

Extracts a byte from a source register and writes it to an 8-bit memory location or to the low-order 
byte of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate 
byte operand select the byte to be extracted: 


Value of imm8 [3:0] 

Source Bits Extracted 

0000 

[7:0] 

0001 

[15:8] 

0010 

[23:16] 

0011 

[31:24] 

0100 

[39:32] 

0101 

[47:40] 

0110 

[55:48] 

0111 

[63:56] 

1000 

[71:64] 

1001 

[79:72] 

1010 

[87:80] 

1011 

[95:88] 

1100 

[103:96] 

1101 

[111:104] 

1110 

[119:112] 

1111 

[127:120] 


There are legacy and extended fonns of the instruction: 

PEXTRB 

The source operand is an XMM register and the destination is either an 8-bit memory location or the 
low-order byte of a general-purpose register. When the destination is a general-purpose register, the 
extracted byte is zero-extended to 32 or 64 bits. 

VPEXTRB 

The extended form of the instruction has a 128-bit encoding only. 

The source operand is an XMM register and the destination is either an 8-bit memory location or the 
low-order byte of a general-purpose register. When the destination is a general-purpose register, the 
extracted byte is zero-extended to 32 or 64 bits. 

Instruction Support 


Form 

Subset 

Feature Flag 

PEXTRB 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPEXTRB 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

PEXTRB reg/m8, xmm, imm8 66 OF 3A 14 /r ib Extracts an 8-bit value specified by imm8 from xmm 

and writes it to m8 or the low-order byte of a general- 
purpose register, with zero-extension. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPEXTRB reg/mem8, xmm, imm8 C4 RXB.03 X.1111.0.01 14/rib 

Related Instructions 

(V)PEXTRD, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PEXTRD Extract 

VPEXTRD Packed Doubleword 

Extracts a doubleword from a source register and writes it to an 32-bit memory location or a 32-bit 
general-purpose register. Bits [1:0] of an immediate byte operand select the doubleword to be 
extracted: 


Value of imm8 [1:0] 

Source Bits Extracted 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 


There are legacy and extended fonns of the instruction: 

PEXTRD 

The encoding is the same as PEXTRQ, with REX.W = 0. 

The source operand is an XMM register and the destination is either an 32-bit memory location or a 
32-bit general-purpose register. 

VPEXTRD 

The extended fonn of the instruction has a 128-bit encoding only. 

The encoding is the same as VPEXTRQ, with VEX.W = 0. 

The source operand is an XMM register and the destination is either an 32-bit memory location or a 
32-bit general-purpose register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PEXTRD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPEXTRD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PEXTRD reg32/mem32, xmm, imm8 66 (WO) OF 3A 16 /r ib Extracts a 32-bit value specified by imm8 from 

xmm and writes it to mem32 or reg32. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPEXTRD reg32/mem32, xmm, imm8 C4 RXB.03 0.1111.0.01 16/rib 


Instruction Reference 


PEXTRD, VPEXTRD 


329 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Related Instructions 

(V)PEXTRB, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PEXTRQ Extract 

VPEXTRQ Packed Quadword 

Extracts a quadword from a source register and writes it to an 64-bit memory location or to a 64-bit 
general-purpose register. Bit [0] of an immediate byte operand selects the quadword to be extracted: 


Value of imm8 [0] 

Source Bits Extracted 

0 

[63:0] 

1 

[127:64] 


There are legacy and extended fonns of the instruction: 

PEXTRQ 

The encoding is the same as PEXTRD, with REX.W = 1. 

The source operand is an XMM register and the destination is either an 64-bit memory location or a 
64-bit general-purpose register. 

VPEXTRQ 

The extended form of the instruction has a 128-bit encoding only. 

The encoding is the same as VPEXTRD, with VEX.W = 1. 

The source operand is an XMM register and the destination is either an 64-bit memory location or a 
64-bit general-purpose register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PEXTRD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPEXTRD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PEXTRQ reg64/mem64, xmm, imm8 66 (W1) OF 3A 16 /r ib Extracts a 64-bit value specified by imm8 from 

xmm and writes it to mem64 or reg64. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPEXTRQ reg64/mem64, xmm, imm8 C4 RXB.03 1.1111.0.01 16/rib 

Related Instructions 

(V)PEXTRB, (V)PEXTRD, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

s 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PEXTRW Extract Packed Word 

VPEXTRW 

Extracts a word from a source register and writes it to a 16-bit memory location or to the low-order 
word of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate 
byte operand select the word to be extracted: 


Value of imm8 [2:0] 

Source Bits Extracted 

000 

[15:0] 

001 

[31:16] 

010 

[47:32 

011 

[63:48] 

100 

[79:64] 

101 

[95:80] 

110 

[111:96] 

111 

[127:112] 


There are legacy and extended forms of the instruction: 

PEXTRW 

The legacy fonn of the instruction has SSE2 and SSE4.1 encodings. 

The source operand is an XMM register and the destination is the low-order word of a general-pur¬ 
pose register. The extracted word is zero-extended to 32 or 64 bits. 

The source operand is an XMM register and the destination is either an 16-bit memory location or the 
low-order word of a general-purpose register. When the destination is a general-purpose register, the 
extracted word is zero-extended to 32 or 64 bits. 

VPEXTRW 

The extended fonn of the instruction has two 128-bit encodings that conespond to the two legacy 
encodings. 

The source operand is an XMM register and the destination is the low-order word of a general-pur¬ 
pose register. The extracted word is zero-extended to 32 or 64 bits. 

The source operand is an XMM register and the destination is either an 16-bit memory location or the 
low-order word of a general-purpose register. When the destination is a general-purpose register, the 
extracted word is zero-extended to 32 or 64 bits. 


Instruction Support 


Form 

Subset 

Feature Flag 

PEXTRW reg 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

PEXTRW reg/mem16 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPEXTRW 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PEXTRW reg, xmm, imm8 66 OF C5 /r ib 

PEXTRW reg/ml6, xmm, imm8 66 OF 3A 15 /r ib 

Mnemonic 

VPEXTRW reg, xmm, imm8 
VPEXTRW reg/mem16, xmm, imm8 

Related Instructions 


Description 

Extracts a 16-bit value specified by imm8 from xmm 
and writes it to the low-order byte of a general- 
purpose register, with zero-extension. 

Extracts a 16-bit value specified by imm8 from xmm 
and writes it to ml6 or the low-order byte of a 
general-purpose register, with zero-extension. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.1111.0.01 

C5 /r ib 

C4 

RXB.03 

X.1111.0.01 

15/r ib 


(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

s 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

s 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PHADDD Packed Horizontal Add 

VPHADDD Doubleword 

Adds adjacent 32-bit signed integers in each of two source operands and packs the sums into the des¬ 
tination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set) 
and only the low-order 32 bits of the sum are written in the destination. 

Adds the 32-bit signed integer values in bits [63:32] and bits [31:0] of the first source operand and 
packs the sum into bits [31:0] of the destination; adds the 32-bit signed integer values in bits [127:96] 
and bits [95:64] of the first source operand and packs the sum into bits [63:32] of the destination. 
Adds the corresponding values in the second source operand and packs the sums into bits [95:64] and 
[127:96] of the destination. 

Additionally, for the 256-bit form, adds the 32-bit signed integer values in bits [191:160] and bits 
[159:128] of the first source operand and packs the sum into bits [159:128] of the destination; adds 
the 32-bit signed integer values in bits [255:224] and bits [223:192] of the first source operand and 
packs the sum into bits [191:160] of the destination. Adds the corresponding values in the second 
source operand and packs the sums into bits [223:192] and [255:224] of the destination. 

There are legacy and extended fonns of the instruction: 

PHADDD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination not affected. 


VPHADDD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


instruction Support 


Form 

Subset 

Feature Flag 

PHADDD 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHADDD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHADDD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PHADDD xmml, xmm2/mem128 66 OF 38 02 /r Adds adjacent pairs of signed integers in xmml and 

xmm2 or mem128. Writes packed sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srclO.OI 

02 /r 

VPHADDD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

02 /r 


Related Instructions 

(V)PHADDW, (V)PHADDSW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PHADDSW Packed Horizontal Add with Saturation 

VPHADDSW Word 

Adds adjacent 16-bit signed integers in each of two source operands, with saturation, and packs the 
16-bit signed sums into the destination. 

Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated 
to 8000h. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

Ssum() is a function that returns the saturated 16-bit signed sum of its arguments. 

dest[15:0] = Ssum(src1[31:16], srcl[15:0]) 
dest[31:16] = Ssum(src1 [63:48], srcl[47:32]) 
dest[47:32] = Ssum(src1 [95:80], srcl[79:64]) 
dest[63:48] = Ssum(src1[127:112], srcl[111:96]) 
dest[79:64] = Ssum(src2[31:16], src2[15:0]) 
dest[95:80] = Ssum(src2[63:48], src2[47:32]) 
dest[111:96] = Ssum(src2[95:80], src2[79:64]) 
dest[127:112] = Ssum(src2[127:112], src2[111:96]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[143:128] = Ssum(src1[159:144], srcl[143:128]) 
dest[159:144] = Ssum(src1[191:176], srcl[175:160]) 
dest[175:160] = Ssum(src1 [223:208], srcl [207:192]) 
dest[191:176] = Ssum(src1 [255:240], srcl[239:224]) 
dest[207:192] = Ssum(src2[159:144], src2[143:128]) 
dest[223:208] = Ssum(src2[191:176], src2[175:160]) 
dest[239:224] = Ssum(src2[223:208], src2[207:192]) 
dest[255:240] = Ssum(src2[255:240], src2[239:224]) 

There are legacy and extended fonns of the instruction: 

PHADDSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPHADDSW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PHADDSW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHADDSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHADDSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PHADDSW xmml, xmm2/mem128 66 OF 38 03 /r Adds adjacent pairs of signed integers in xmml and 

xmm2 or mem128, with saturation. Writes packed 
sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDSW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.src7.0.01 

03 It 

VPHADDSW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

03 /r 


Related Instructions 

(V)PHADDD, (V)PHADDW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PHADDW Packed Horizontal Add 

VPHADDW Word 

Adds adjacent 16-bit signed integers in each of two source operands and packs the 16-bit sums into 
the destination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS 
is set). 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

dest[15:0] = srcl [31:16] + src1[15:0] 
dest[31:16] = srcl [63:48] + srcl [47:32] 
dest[47:32] = srcl [95:80] + srcl [79:64] 
dest[63:48] = srcl [127:112] + srcl [111:96] 
dest[79:64] = src2[31:16] + src2[15:0] 
dest[95:80] = src2[63:48] + src2[47:32] 
dest[111:96] = src2[95:80] + src2[79:64] 
dest[127:112] = src2[127:112] + src2[111:96] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[143:128] = src1[159:144] + src1[143:128] 
dest[159:144] = srcl [191:176] + srcl [175:160] 
dest[175:160] = srcl [223:208] + srcl [207:192] 
dest[191:176] = srcl [255:240] + srcl [239:224] 
dest[207:192] = src2[159:144] + src2[143:128] 
dest[223:208] = src2[191:176] + src2[175:160] 
dest[239:224] = src2[223:208] + src2[207:192] 
dest[255:240] = src2[255:240] + src2[239:224] 

There are legacy and extended forms of the instruction: 

PHADDW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPHADDW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared.YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PHADDW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHADDW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHADDW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PHADDW xmml, xmm2/mem128 66 OF 38 01 /r Adds adjacent pairs of signed integers in xmml and 

xmm2 or mem128. Writes packed sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl.0.01 

01 Ir 

VPHADDW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

01 Ir 


Related Instructions 

(V)PHADDD, (V)PHADDSW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PHMINPOSUW Horizontal Minimum and Position 

VPHMINPOSUW 

Finds the minimum unsigned 16-bit value in the source operand and copies it to the low order word 
element of the destination. Writes the source position index of the value to bits [18:16] of the destina¬ 
tion and clears bits[ 127:19] of the destination. 

There are legacy and extended forms of the instruction: 

PHMINPOSUW 

The source operand is an XMM register or 128-bit memory location. The destination is an XMM reg¬ 
ister. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPHMINPOSUW 

The extended fonn of the instruction has a 128-bit encoding only. 

The source operand is an XMM register or 128-bit memory location. The destination is an XMM reg¬ 
ister. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

PHMINPOSUW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41 ] (bit 19) 

VPHMINPOSUW 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PHMINPOSUW xmml, xmm2/mem128 66 OF 38 41 /r Finds the minimum unsigned word element in 

xmm2 or mem 128, copies it to xmml[15:0]] writes 
its position index to xmml[18:16], and clears 
xmm1[127:19]. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPHMINPOSUW xmml, xmm2/mem128 C4 RXB.02 X.1111.0.01 41/r 

Related Instructions 

(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PHSUBD Packed Horizontal Subtract 

VPHSUBD Doubleword 

Subtracts adjacent 32-bit signed integers in each of two source operands and packs the differences 
into the destination. The higher-order doubleword of each pair is subtracted from the lower-order 
doubleword. 

Subtracts the 32-bit signed integer value in bits [63:32] of the first source operand from the 32-bit 
signed integer value in bits [31:0] of the first source operand and packs the difference into bits [31:0] 
of the destination; subtracts the 32-bit signed integer value in bits [127:96] of the first source operand 
from the 32-bit signed integer value in bits [95:64] of the first source operand and packs the differ¬ 
ence into bits [63:32] of the destination. Performs the corresponding operations on pairs of 32-bit 
signed integer values in the second source operand and packs the differences into bits [95:64] and 
[127:96] of the destination. 

Additionally, for the 256-bit form, subtracts the 32-bit signed integer value in bits [191:160] of the 
first source operand from the 32-bit signed integer value in bits [159:128] of the first source operand 
and packs the difference into bits [159:128] of the destination; subtracts the 32-bit signed integer 
value in bits [255:224] of the first source operand from the 32-bit integer value in bits [223:192] of 
the first source operand and packs the difference into bits [191:160] of the destination. Perfonns the 
corresponding operations on pairs of 32-bit signed integer values in the second source operand and 
packs the differences into bits [223:192] and [255:224] of the destination. 

There are legacy and extended fonns of the instruction: 

PHSUBD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPHSUBD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PHSUBD 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHSUBD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHSUBD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PHSUBD xmml, xmm2/mem128 66 OF 38 06 /r Subtracts adjacent pairs of signed integers in xmml and 

xmm2 or mem128. Writes packed differences to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHSUBD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcf.0.01 

06 It 

VPHSUBD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcf.1.01 

06 /r 


Related Instructions 

(V)PHSUBW, (V)PHSUBSW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PHSUBSW Packed Horizontal Subtract with Saturation 

VPHSUBSW Word 

Subtracts adjacent 16-bit signed integers in each of two source operands, with saturation, and packs 
the differences into the destination. The higher-order word of each pair is subtracted from the lower- 
order word. 

Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h 
are saturated to 8000h. 

For the 128-bit form of the instruction, the following operations are perfonned: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

Sdiff(A,B) is a function that returns the saturated 16-bit signed difference A - B. 

dest[15:0] = Sdiff(src1 [15:0], src1[31:16]) 
dest[31:16] = Sdiff(src1 [47:32], srcl[63:48]) 
dest[47:32] = Sdiff(src1 [79:64], srcl[95:80]) 
dest[63:48] = Sdiff(src1 [111:96], srcl [127:112]) 
dest[79:64] = Sdiff(src2[15:0], src2[31:16]) 
dest[95:80] = Sdiff(src2[47:32], src2[63:48]) 
dest[111:96] = Sdiff(src2[79:64], src2[95:80]) 
dest[127:112] = Sdiff(src2[111:96], src2[127:112]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[143:128] = Sdiff(src1 [143:128], src1[159:144]) 
dest[159:144] = Sdiff(src1 [175:160], src1[191:176]) 
dest[175:160] = Sdiff(src1 [207:192], srcl [223:208]) 
dest[191:176] = Sdiff(src1 [239:224], srcl [255:240]) 
dest[207:192] = Sdiff(src2[143:128], src2[159:144]) 
dest[223:208] = Sdiff(src2[175:160], src2[191:176]) 
dest[239:224] = Sdiff(src2[207:192], src2[223:208]) 
dest[255:240] = Sdiff(src2[239:224], src2[255:240]) 

There are legacy and extended fonns of the instruction: 

PHSUBSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPHSUBSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PHSUBSW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHSUBSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHSUBSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PHSUBSW xmml , xmm2/mem128 66 OF 38 07 /r Subtracts adjacent pairs of signed integers in xmml 

and xmm2 or mem128, with saturation. Writes packed 
differences to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHSUBSW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

07 It 

VPHSUBSW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcf.1.01 

07 It 


Related Instructions 

(V)PHSUBD, (V)PHSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PHSUBW Packed Horizontal Subtract 

VPHSUBW Word 

Subtracts adjacent 16-bit signed integers in each of two source operands and packs the differences 
into a destination. The higher-order word of each pair is subtracted from the lower-order word. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

dest[15:0] = srcl [15:0] - srcl [31:16 
dest[31:16] = srcl[47:32] - srcl[63:48] 
dest[47:32] = srcl[79:64] - srcl[95:80] 
dest[63:48] = src1[111:96] - srcl [127:112] 
dest[79:64] = src2[15:0] - src2[31:16] 
dest[95:80] = src2[47:32] - src2[63:48] 
dest[111:96] = src2[79:64] - src2[95:80] 
dest[127:112] = src2[111:96] - src2[127:112] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[143:128] = srcl [143:128] - srcl[159:144] 
dest[159:144] = srcl [175:160] - src1[191:176] 
dest[175:160] = srcl [207:192] - srcl[223:208] 
dest[191:176] = srcl [239:224] - srcl[255:240] 
dest[207:192] = src2[143:128] - src2[159:144] 
dest[223:208] = src2[175:160] - src2[191:176] 
dest[239:224] = src2[207:192] - src2[223:208] 
dest[255:240] = src2[239:224] - src2[255:240] 

There are legacy and extended forms of the instruction: 

PHSUBW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPHSUBW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PHSUBW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPHSUBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPHSUBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PHSUBW xmml, xmm2/mem128 66 OF 38 05 /r Subtracts adjacent pairs of signed integers in xmml 

and xmm2 or mem128. Writes packed differences to 



xmml. 




Mnemonic 



Encoding 



VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

VPHSUBW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

05 It 

VPHSUBW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

05 /r 


Related Instructions 

(V)PHSUBD, (V)PHSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PINSRB Packed Insert 

VPINSRB Byte 

Inserts a byte from an 8-bit memory location or the low-order byte of a 32-bit general-purpose regis¬ 
ter into a destination register. Bits [3:0] of an immediate byte operand select the location where the 
byte is to be inserted: 


Value of imm8 [3:0] 

Insertion Location 

0000 

[7:0] 

0001 

[15:8] 

0010 

[23:16] 

0011 

[31:24] 

0100 

[39:32] 

0101 

[47:40] 

0110 

[55:48] 

0111 

[63:56] 

1000 

[71:64] 

1001 

[79:72] 

1010 

[87:80] 

1011 

[95:88] 

1100 

[103:96] 

1101 

[111:104] 

1110 

[119:112] 

1111 

[127:120] 


There are legacy and extended fonns of the instruction: 

PINSRB 

The source operand is either an 8-bit memory location or the low-order byte of a 32-bit general-pur¬ 
pose register and the destination an XMM register. The other bytes of the destination are not affected. 
Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPINSRB 

The extended form of the instruction has a 128-bit encoding only. 

There are two source operands. The first source operand is either an 8-bit memory location or the 
low-order byte of a 32-bit general-purpose register and the second source operand is an XMM regis¬ 
ter. The destination is a second XMM register. All the bytes of the second source other than the byte 
that corresponds to the location of the inserted byte are copied to the destination. Bits [255:128] of the 
YMM register that corresponds to destination are cleared. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PINSRB 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPINSRB 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

PINSRB xmm, reg32/mem8, imm8 66 OF 3A 20 /r ib Inserts an 8-bit value selected by imm8 from the 

low-order byte of reg32 or from mem8 into xmm. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPINSRB xmm, reg/mem8, xmm, imm8 C4 RXB.03 X.1111.0.01 20/r ib 

Related Instructions 

(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRD, (V)PINSRQ, (V)PINSRW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PINSRD Packed Insert 

VPINSRD Doubleword 

Inserts a doubleword from a 32-bit memory location or a 32-bit general-purpose register into a desti¬ 
nation register. Bits [1:0] of an immediate byte operand select the location where the doubleword is to 
be inserted: 


Value of imm8 [1:0] 

Insertion Location 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 


There are legacy and extended fonns of the instruction: 

PINSRD 

The encoding is the same as PINSRQ, with REX.W = 0. 

The source operand is either a 32-bit memory location or a 32-bit general-purpose register and the 
destination an XMM register. The other doublewords of the destination are not affected. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPINSRD 

The extended fonn of the instruction has a 128-bit encoding only. 

The encoding is the same as VPINSRQ, with VEX.W = 0. 

There are two source operands. The first source operand is either a 32-bit memory location or a 32-bit 
general-purpose register and the second source operand is an XMM register. The destination is a sec¬ 
ond XMM register. All the doublewords of the second source other than the doubleword that corre¬ 
sponds to the location of the inserted doubleword are copied to the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

PINSRD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPINSRD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode 

PINSRD xmm, reg32lmem32, imm8 66 (WO) OF 3A 22 /r ib 

Mnemonic 


VEX 

VPINSRD xmm , reg32/mem32, xmm, imm8 C4 


Description 

Inserts a 32-bit value selected by imm8 from 
reg32 or mem32 into xmm. 

Encoding 

RXB.mapselect W.vvvv.L.pp Opcode 

RXB.03 0.1111.0.01 22 /rib 


Related Instructions 

(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRQ, (V)PINSRW 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PINSRQ Packed Insert 

VPINSRQ Quadword 

Inserts a quadword from a 64-bit memory location or a 64-bit general-purpose register into a destina¬ 
tion register. Bit [0] of an immediate byte operand selects the location where the doubleword is to be 
inserted: 


Value of imm8 [0] 

Insertion Location 

0 

[63:0] 

1 

[127:64] 


There are legacy and extended forms of the instruction: 

PINSRQ 

The encoding is the same as PINSRD, with REX.W = 1. 

The source operand is either a 64-bit memory location or a 64-bit general-purpose register and the 
destination an XMM register. The other quadwords of the destination are not affected. Bits [255:128] 
of the YMM register that corresponds to the destination are not affected. 

VPINSRQ 

The extended fonn of the instruction has a 128-bit encoding only. 

The encoding is the same as VPINSRD, with VEX.W = 1. 

There are two source operands. The first source operand is either a 64-bit memory location or a 64-bit 
general-purpose register and the second source operand is an XMM register. The destination is a sec¬ 
ond XMM register. All the quadwords of the second source other than the quadword that corresponds 
to the location of the inserted quadword are copied to the destination. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination XMM registers are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

PINSRQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPINSRQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

PINSRQ xmm, reg64lmem64, imm8 66 (W1) OF 3A 22 /r ib 

Mnemonic 


VEX 

VPINSRQ xmm , reg64/mem64, xmm, imm8 C4 


Description 

Inserts a 64-bit value selected by imm8 from 
reg64 or mem64 into xmm. 

Encoding 

RXB.map_select W.vvvv.L.pp Opcode 

RXB.03 1.1111.0.01 22 /rib 
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Related Instructions 

(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PINSRW Packed Insert Word 

VPINSRW 

Inserts a word from a 16-bit memory location or the low-order word of a 32-bit general-purpose reg¬ 
ister into a destination register. Bits [2:0] of an immediate byte operand select the location where the 
byte is to be inserted: 


Value of imm8 [2:0] 

Insertion Location 

000 

[15:0] 

001 

[31:16] 

010 

[47:32 

011 

[63:48] 

100 

[79:64] 

101 

[95:80] 

110 

[111:96] 

111 

[127:112] 


There are legacy and extended forms of the instruction: 

PINSRW 

The source operand is either a 16-bit memory location or the low-order word of a 32-bit general-pur¬ 
pose register and the destination an XMM register. The other words of the destination are not 
affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VPINSRW 

The extended form of the instruction has a 128-bit encoding only. 

There are two source operands. The first source operand is either a 16-bit memory location or the 
low-order word of a 32-bit general-purpose register and the second source operand is an XMM regis¬ 
ter. The destination is an XMM register. All the words of the second source other than the word that 
corresponds to the location of the inserted word are copied to the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

PINSRW 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VPINSRW 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PINSRW xmm, reg32/mem16 , imm8 66 OF C4 /r ib Inserts a 16-bit value selected by imm8 from the 

low-order word of reg32 or from mem16 into xmm. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPINSRW xmm, reg32/mem16, xmm, imm8 C4 RXB.01 X.1111.0.01 C4/r ib 

Related Instructions 

(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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PMADDUBSW Packed Multiply and Add 

VPMADDUBSW Unsigned Byte to Signed Word 

Multiplies and adds sets of two packed 8-bit unsigned values from the first source operand and two 
packed 8-bit signed values from the second source operand, with signed saturation; writes eight 16-bit 
sums to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

Ssum() is a function that returns the saturated 16-bit signed sum of its arguments. 

dest[15:0] = Ssum(src1[7:0] * src2[7:0], srcl[15:8] * src2[15:8]) 
dest[31:16] = Ssum(src1 [23:16] * src2[23:16], srcl[31:24] * src2[31:24]) 
dest[47:32] = Ssum(src1 [39:32] * src2[39:32], srcl[47:40] * src2[47:40]) 
dest[63:48] = Ssum(src1 [55:48] * src2[55:48], srcl [63:56] * src2[63:56]) 
dest[79:64] = Ssum(src1 [71:64] * src2[71:64], srcl[79:72] * src2[79:72]) 
dest[95:80] = Ssum(src1 [87:80] * src2[87:80], srcl [95:88] * src2[95:88]) 
dest[111:96] = Ssum(src1 [103:96] * src2[103:96]], srcl[111:104] * src2[111:104]) 
dest[127:112] = Ssum(src1[119:112] * src2[119:112], srcl [127:120] * src2[127:120]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[143:128] = Ssum(src1[135:128] * src2[135:128], srcl[143:136] * src2[143:136]) 
dest[159:144] = Ssum(src1[151:144] * src2[151:144], srcl [159:152] * src2[159:152]) 
dest[175:160] = Ssum(src1 [167:160] * src2[167:160], srcl [175:168] * src2[175:168]) 
dest[191:176] = Ssum(src1[183:176] * src2[183:176], srcl [191:184] * src2[191:184]) 
dest[207:192] = Ssum(src1[199:192] * src2[199:192], src1[207:200] * src2[207:200]) 
dest[223:208] = Ssum(src1 [215:208] * src2[215:208], srcl [223:216] * src2[223:216]) 
dest[239:224] = Ssum(src1[231:224] * src2[231:224], srcl [239:232] * src2[239:232]) 
dest[255:240] = Ssum(src1 [247:240] * src2[247:240], srcl [255:248] * src2[255:248]) 

There are legacy and extended fonns of the instruction: 

PMADDUBSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMADDUBSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PMADDUBSW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPMADDUBSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMADDUBSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

PMADDUBSW xmml, xmm2/mem128 66 OF 38 04 /r 


Mnemonic 

VPMADDUBSW xmml, xmm2, xmm3/mem128 
VPMADDUBSW ymml, ymm2, ymm3/mem256 


Description 

Multiplies packed 8-bit unsigned values in xmml and 
packed 8-bit signed values xmm2 / mem128, adds 
the products, and writes saturated sums to xmml. 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.02 X.srcT.O.OI 04/r 

C4 RXB.02 X.srclAM 04/r 


Related Instructions 

(V)PMADDWD 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMADDWD Packed Multiply and Add 

VPMADDWD Word to Doubleword 

Multiplies and adds sets of four packed 16-bit signed values from two source registers; writes four 
32-bit sums to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

dest[31:0] = (srcl [15:0] * src2[15:0]) + (srcl [31:16] * src2[31:16]) 
dest[63:32] = (srcl[47:32] * src2[47:32]) + (srcl[63:48] * src2[63:48]) 
dest[95:64] = (srcl[79:64] * src2[79:64]) + (srcl[95:80] * src2[95:80]) 
dest[127:96] = (srcl[111:96] * src2[111:96]) + (srcl [127:112] * src2[127:112]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[159:128] = (srcl [143:128] * src2[143:128]) + (srcl [159:144] * src2[159:144]) 
dest[191:160] = (srcl [175:160] * src2[175:160]) + (srcl [191:176] * src2[191:176]) 
dest[223:192] = (srcl [207:192] * src2[207:192]) + (srcl [223:208] * src2[223:208]) 
dest[255:224] = (srcl [239:224] * src2[239:224]) + (srcl [255:240] * src2[255:240]) 

When all four of the signed 16-bit source operands in a set have the value 8000h, the 32-bit overflow 
wraps around to 8000_0000h. There are no other overflow cases. 

There are legacy and extended forms of the instruction: 

PMADDWD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMADDWD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMADDWD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMADDWD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMADDWD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PMADDWD xmml, xmm2/mem128 66 OF F5 /r Multiplies packed 16-bit signed values in xmml and 

xmm2 or mem128, adds the products, and writes the 
sums to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMADDWD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

F5 /r 

VPMADDWD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

F5 It 


Related Instructions 

(V)PMADDUBSW, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ 


rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMAXSB Packed Maximum 

VPMAXSB Signed Bytes 

Compares each packed 8-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
byte of the destination. 

The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form 
compares 32 pairs. 

There are legacy and extended forms of the instruction: 

PMAXSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXSB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXSB 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMAXSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXSB xmml, xmm2/mem128 66 OF 38 3C /r Compares 16 pairs of packed 8-bit values in xmml and 

xmm2 or mem128 and writes the greater values to the 
corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMAXSB xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

3C /r 

VPMAXSB ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

3C /r 


Instruction Reference 


PMAXSB, VPMAXSB 
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Related Instructions 

(V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMAXSD Packed Maximum 

VPMAXSD Signed Doublewords 

Compares each packed 32-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
doubleword of the destination. 

The 128-bit fonn of the instruction compares four pairs of 32-bit signed integer values; the 256-bit 
fonn compares eight. 

There are legacy and extended forms of the instruction: 

PMAXSD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXSD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXSD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMAXSD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXSD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXSD xmml, xmm2/mem128 66 OF 38 3D /r Compares four pairs of packed 32-bit values in xmml 

and xmm2 or meml28 and writes the greater values to 
the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMAXSD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srct.0.01 

3D /r 

VPMAXSD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

3D /r 


Instruction Reference 


PMAXSD, VPMAXSD 
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Related Instructions 

(V)PMAXSB, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMAXSW Packed Maximum 

VPMAXSW Signed Words 

Compares each packed 16-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
word of the destination. 

The 128-bit fonn of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit 
fonn compares 16 pairs. 

There are legacy and extended forms of the instruction: 

PMAXSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMAXSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXSW xmml, xmm2/mem128 66 OF EE /r Compares eight pairs of packed 16-bit values in xmml 

and xmm2 or mem128 and writes the greater values to 
the corresponding positions in xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMAXSW xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srclO.OI EE/r 

VPMAXSW ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcl. 1.01 EE/r 


Instruction Reference 


PMAXSW, VPMAXSW 
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Related Instructions 

(V)PMAXSB, (V)PMAXSD, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMAXUB Packed Maximum 

VPMAXUB Unsigned Bytes 

Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
byte of the destination. 

The 128-bit fonn of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit 
fonn compares 32 pairs. 

There are legacy and extended forms of the instruction: 

PMAXUB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXUB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXUB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMAXUB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXUB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXUB xmml, xmm2/mem128 66 OF DE /r Compares 16 pairs of packed unsigned 8-bit values in 

xmml and xmm2 or mem128 and writes the greater 
values to the corresponding positions in xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMAXUB xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srclO.OI DE/r 

VPMAXUB ymml, ymm2, ymm3/mem256 C4 RXB.01 X.src7.1.01 DE/r 


Instruction Reference 


PMAXUB, VPMAXUB 
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Related Instructions 

(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUD, (V)PMAXUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


None 
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PMAXUD Packed Maximum 

VPMAXUD Unsigned Doublewords 

Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
doubleword of the destination. 

The 128-bit fonn of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit 
fonn compares eight. 

There are legacy and extended forms of the instruction: 

PMAXUD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXUD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXUD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMAXUD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXUD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXUD xmml, xmm2/mem128 66 OF 38 3F /r Compares four pairs of packed unsigned 32-bit values 

in xmml and xmm2 or mem128 and writes the greater 
values to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMAXUD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

3F/r 

VPMAXUD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

3F/r 


Instruction Reference 


PMAXUD, VPMAXUD 
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Related Instructions 

(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMAXUW Packed Maximum 

VPMAXUW Unsigned Words 

Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically greater value into the corresponding 
word of the destination. 

The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit 
form compares 16 pairs. 

There are legacy and extended forms of the instruction: 

PMAXUW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMAXUW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMAXUW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMAXUW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMAXUW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMAXUW xmml, xmm2/mem128 66 OF 38 3E /r Compares eight pairs of packed unsigned 16-bit values 

in xmml and xmm2 or mem128 and writes the greater 
values to the corresponding positions in xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMAXUW xmml, xmm2, xmm3/mem128 C4 RXB.02 X.srcT.O.OI 3E/r 

VPMAXUW ymml, ymm2, ymm3/mem256 C4 RXB.02 X.srcT.I.OI 3E/r 


Instruction Reference 


PMAXUW, VPMAXUW 
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Related Instructions 

(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMINSB Packed Minimum 

VPMINSB Signed Bytes 

Compares each packed 8-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
byte of the destination. 

The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form 
compares 32 pairs. 

There are legacy and extended forms of the instruction: 

PMINSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINSB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINSB 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMINSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINSB xmml, xmm2/mem128 66 OF 38 38 /r Compares 16 pairs of packed 8-bit values in xmml and 

xmm2 or mem128 and writes the lesser values to the 
corresponding positions in xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMINSB xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

38 /r 

VPMINSB ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

38 /r 


Instruction Reference 
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Related Instructions 

(V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMINSD Packed Minimum 

VPMINSD Signed Doublewords 

Compares each packed 32-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
doubleword of the destination. 

The 128-bit fonn of the instruction compares four pairs of 32-bit signed integer values; the 256-bit 
fonn compares eight. 

There are legacy and extended forms of the instruction: 

PMINSD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINSD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINSD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMINSD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINSD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINSD xmml, xmm2/mem128 66 OF 38 39 /r Compares four pairs of packed 32-bit values in xmml 

and xmm2 or mem128 and writes the lesser values to 
the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMINSD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

39 /r 

VPMINSD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

39 /r 
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Related Instructions 

(V)PMINSB, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMINSW Packed Minimum Signed Words 

VPMINSW 

Compares each packed 16-bit signed integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
word of the destination. 

The 128-bit fonn of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit 
fonn compares 16 pairs. 

There are legacy and extended forms of the instruction: 

PMINSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMINSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINSW xmml, xmm2/mem128 66 0FEA/r Compares eight pairs of packed 16-bit values in xmml 

and xmm2 or mem128 and writes the lesser values to the 
corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMINSW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

EA /r 

VPMINSW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

EA /r 
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Related Instructions 

(V)PMINSB, (V)PMINSD, (V)PMINUB, (V)PMINUD, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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AMD64 Technology 


PMINUB Packed Minimum 

VPMINUB Unsigned Bytes 

Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
byte of the destination. 

The 128-bit fonn of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit 
fonn compares 32 pairs. 

There are legacy and extended forms of the instruction: 

PMINUB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINUB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINUB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMINUB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINUB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINUB xmml, xmm2/mem128 66 OF DA /r Compares 16 pairs of packed unsigned 8-bit values in 

xmml and xmm2 or mem128 and writes the lesser 
values to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMINUB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

DA /r 

VPMINUB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

DA It 
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Related Instructions 

(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUD, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMINUD Packed Minimum 

VPMINUD Unsigned Doublewords 

Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
doubleword of the destination. 

The 128-bit fonn of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit 
fonn compares eight. 

There are legacy and extended forms of the instruction: 

PMINUD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINUD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINUD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMINUD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINUD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINUD xmml, xmm2/mem128 66 OF 38 3B /r Compares four pairs of packed unsigned 32-bit values 

in xmml and xmm2 or mem128 and writes the lesser 
values to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPMINUD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srclO.OI 

3B/r 

VPMINUD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcf.1.01 

3B/r 
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Related Instructions 

(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMINUW Packed Minimum Unsigned Words 

VPMINUW 

Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding 
value of the second source operand and writes the numerically lesser value into the corresponding 
word of the destination. 

The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit 
form compares 16 pairs. 

There are legacy and extended forms of the instruction: 

PMINUW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMINUW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMINUW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMINUW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMINUW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMINUW xmml, xmm2/mem128 66 OF 38 3A /r Compares eight pairs of packed unsigned 16-bit values 

in xmml and xmm2 or mem128 and writes the lesser 
values to the corresponding positions in xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMINUW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl. 0.01 

3A /r 

VPMINUW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

3A /r 
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Related Instructions 

(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVMSKB Packed Move Mask 

VPMOVMSKB Byte 

Copies the value of the most-significant bit of each byte element of the source operand to create a 16 
or 32 bit mask value, zero-extends the value, and writes it to the destination. 

There are legacy and extended forms of the instruction: 

PMOVMSKB 

The source operand is an XMM register. The destination is a 32-bit general purpose register. The 
mask is zero-extended to fill the destination register, the mask occupies bits [15:0]. 

VPMOVMSKB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register. The destination is a 64-bit general purpose register. The 
mask is zero-extended to fill the destination register, the mask occupies bits [15:0]. 

YMM Encoding 

The source operand is a YMM register. The destination is a 64-bit general purpose register. The mask 
is zero-extended to fill the destination register, the mask occupies bits [31:0]. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMOVMSKB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMOVMSKB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVMSKB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVMSKB reg32, xmml 66 OF D7 /r Moves a zero-extended mask consisting of the most- 

significant bit of each byte in xmml to a 32-bit general- 
purpose register. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VMOVMSKB reg64, xmml 

C4 

RXB.01 

X.1111.0.01 

D7/r 

VMOVMSKB reg64, ymml 

C4 

RXB.01 

X.1111.1.01 

D7/r 


Related Instructions 

(V)MOVMSKPD, (V)MOVMSKPS 


Instruction Reference 


PMOVMSKB, VPMOVMSKB 


391 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv field ! = 1111b. 



A 

VEX.L field = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — SSE, AVX and AVX2 exception 

A — AVX, AVX2exception 

S — SSE exception 
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PMOVSXBD Packed Move with Sign-Extension 

VPMOVSXBD Byte to Doubleword 

Sign-extends four or eight packed 8-bit signed integers in the source operand to 32 bits and writes the 
packed doubleword signed integers to the destination. 

If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes 
of the register. 

There are legacy and extended fonns of the instruction: 

PMOVSXBD 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXBD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXBD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXBD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVSXBD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 
Mnemonic 

PMOVSXBD xmml, xmm2/mem32 


Mnemonic 

VPMOVSXBD xmml, xmm2/mem32 
VPMOVSXBD ymml, xmm2/mem64 


Opcode Description 

66 OF 38 21 /r Sign-extends four packed signed 8-bit 

integers in the four low bytes of xmm2 or 
mem32 and writes four packed signed 
32-bit integers to xmml. 


Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.02 X.1111.0.01 21/r 

C4 RXB.02 X.1111.1.01 21/r 
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Related Instructions 

(V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVSXBQ Packed Move with Sign Extension 

VPMOVSXBQ Byte to Quadword 

Sign-extends two or four packed 8-bit signed integers in the source operand to 64 bits and writes the 
packed quadword signed integers to the destination. 

If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes 
of the register. 

There are legacy and extended fonns of the instruction: 

PMOVSXBQ 

The source operand is either an XMM register or a 16-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXBQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 16-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXBQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXBQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVSXBQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVSXBQ xmml, xmm2/mem16 66 OF 38 22 /r Sign-extends two packed signed 8-bit 

integers in the two low bytes of xmm2 
or mem16 and writes two packed 
signed 64-bit integers to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMOVSXBQ xmml, xmm2/mem16 C4 RXB.02 X.1111.0.01 22/r 

VPMOVSXBQ ymml, xmm2/mem32 C4 RXB.02 X.1111.1.01 22/r 
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Related Instructions 

(V)PMOVSXBD, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVSXBW Packed Move with Sign Extension 

VPMOVSXBW Byte to Word 

Sign-extends eight or sixteen packed 8-bit signed integers in the source operand to 16 bits and writes 
the packed word signed integers to the destination. 

If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended fonns of the instruction: 

PMOVSXBW 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXBW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXBW 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVSXBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

PMOVSXBW xmml, xmm2/mem64 66 OF 38 20 /r Sign-extends eight packed signed 8-bit 

integers in the eight low bytes of xmm2 or 
mem64 and writes eight packed signed 
16-bit integers to xmml. 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVSXBW xmml, xmm2/mem64 

C4 

RXB.02 

X.1111.0.01 

20/r 

VPMOVSXBW ymml, xmm2/mem128 

C4 

RXB.02 

X.1111.1.01 

20/r 
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Related Instructions 

(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVSXDQ Packed Move with Sign-Extension 

VPMOVSXDQ Doubleword to Quadword 

Sign-extends two or four packed 32-bit signed integers in the source operand to 64 bits and writes the 
packed quadword signed integers to the destination. 

If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended forms of the instruction: 

PMOVSXDQ 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXDQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXDQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVSXDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVSXDQ xmml, xmm2/mem64 66 OF 38 25 /r Sign-extends two packed signed 32-bit 

integers in the two low doublewords of 
xmm2 or mem64 and writes two packed 
signed 64-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVSXDQ xmml, xmm2/mem64 

C4 

RXB.02 

X.1111.0.01 

25 It 

VPMOVSXDQ ymml, xmm2/mem128 

C4 

RXB.02 

X.1111.1.01 

25 It 
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Related Instructions 

(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXWD, (V)PMOVSXWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVSXWD Packed Move with Sign-Extension 

VPMOVSXWD Word to Doubleword 


Sign-extends four or eight packed 16-bit signed integers in the source operand to 32 bits and writes 
the packed doubleword signed integers to the destination. 

If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended fonns of the instruction: 

PMOVSXWD 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXWD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXWD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXWD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVSXWD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVSXWD xmml, xmm2/mem64 66 OF 38 23 /r Sign-extends four packed signed 16-bit 

integers in the four low words of xmm2 or 
mem64 and writes four packed signed 32-bit 
integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVSXWD xmml, xmm2/mem64 

C4 

RXB.02 

X.1111.0.01 

23 It 

VPMOVSXWD ymml, xmm2/mem128 

C4 

RXB.02 

X.1111.1.01 

23 It 
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Related Instructions 

(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVSXWQ Packed Move with Sign-Extension 

VPMOVSXWQ Word to Quadword 

Sign-extends two or four packed 16-bit signed integers to 64 bits and writes the packed quadword 
signed integers to the destination. 

If the source operand is a register, the 16-bit signed integers are taken from least-significant words of 
the register. 

There are legacy and extended fonns of the instruction: 

PMOVSXWQ 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVSXWQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVSXWQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVSXWQ 128-bit 

AVX 

CPUID Fn0000_0001 _ECX[AVX] (bit 28) 

VPMOVSXWQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVSXWQ xmml, xmm2/mem32 66 OF 38 24 /r Sign-extends two packed signed 16-bit 

integers in the two low words of xmm2 or 
mem32 and writes two packed signed 
64-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVSXWQ xmml, xmm2/mem32 

C4 

RXB.02 

X.1111.0.01 

24 /r 

VPMOVSXWQ ymml, xmm2/mem64 

C4 

RXB.02 

X.1111.1.01 

24 /r 
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Related Instructions 

(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXBD Packed Move with Zero-Extension 

VPMOVZXBD Byte to Doubleword 

Zero-extends four or eight packed 8-bit unsigned integers in the source operand to 32 bits and writes 
the packed doubleword positive-signed integers to the destination. 

If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes 
of the register. 

There are legacy and extended forms of the instruction: 

PMOVZXBD 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXBD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVZXBD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41 ] (bit 19) 

VPMOVZXBD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVZXBD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVZXBD xmml, xmm2/mem32 66 OF 38 31 /r Zero-extends four packed unsigned 8-bit 

integers in the four low bytes of xmm2 or 
mem32 and writes four packed positive- 
signed 32-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVZXBD xmml, xmm2/mem32 

C4 

RXB.02 

X.1111.0.01 

31 /r 

VPMOVZXBD ymml, xmm2/mem64 

C4 

RXB.02 

X.1111.1.01 

31 /r 
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Related Instructions 

(V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXBQ Packed Move Byte to Quadword 

VPMOVZXBQ with Zero-Extension 

Zero-extends two or four packed 8-bit unsigned integers in the source operand to 64 bits and writes 
the packed quadword positive-signed integers to the destination. 

If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes 
of the register. 

There are legacy and extended forms of the instruction: 

PMOVZXBQ 

The source operand is either an XMM register or a 16-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXBQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 16-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVZXBQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVZXBQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVZXBQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVZXBQ xmml, xmm2/mem16 66 OF 38 32 /r Zero-extends two packed unsigned 8-bit 

integers in the two low bytes of xmm2 or 
mem16 and writes two packed positive- 
signed 64-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVZXBQ xmml, xmm2/mem16 

C4 

RXB.02 

X.1111.0.01 

32 /r 

VPMOVZXBQ ymml, xmm2/mem32 

C4 

RXB.02 

X.1111.1.01 

32 /r 
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Related Instructions 

(V)PMOVZXBD, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXBW Packed Move Byte to Word with Zero-Extension 
VPMOVZXBW 


Zero-extends eight or sixteen packed 8-bit unsigned integers in the source operand to 16 bits and 
writes the packed word positive-signed integers to the destination. 

If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended fonns of the instruction: 

PMOVZXBW 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXBW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 


Instruction Support 


PMOVZXBW 
VPMOVZXBW 128-bit 
VPMOVZXBW 256-bit 


Subset Feature Flag 


SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) 
AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) 
AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 
Mnemonic 

PMOVZXBW xmml, xmm2/mem64 

Mnemonic 

VPMOVZXBW xmml, xmm2/mem64 
VPMOVZXBW ymml, xmm2/mem128 


Opcode Description 

66 OF 38 30 /r Zero-extends eight packed unsigned 8-bit 
integers in the eight low bytes of xmm2 or 
mem64 and writes eight packed positive- 
signed 16-bit integers to xmml. 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.02 X.1111.0.01 30/r 

C4 RXB.02 X.1111.1.01 30 /r 
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Related Instructions 

(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXDQ Packed Move with Zero-Extension 

VPMOVZXDQ Doubleword to Quadword 

Zero-extends two or four packed 32-bit unsigned integers in the source operand to 64 bits and writes 
the packed quadword positive-signed integers to the destination. 

If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended fonns of the instruction: 

PMOVZXDQ 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXDQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVZXDQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVZXDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVZXDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

PMOVZXDQ xmml, xmm2/mem64 66 OF 38 35 /r Zero-extends two packed unsigned 32-bit 

integers in the two low doublewords of xmm2 
or mem64 and writes two packed positive- 
signed 64-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPMOVZXDQ xmml, xmm2/mem64 

C4 

RXB.02 

X.1111.0.01 

35/r 

VPMOVZXDQ ymml, xmm2/mem128 

C4 

RXB.02 

X.1111.1.01 

35/r 
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Related Instructions 

(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXWD, (V)PMOVZXWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXWD Packed Move Word to Doubleword 

VPMOVZXWD with Zero-Extension 


Zero-extends four or eight packed 16-bit unsigned integers in the source operand to 32 bits and writes 
the packed doubleword positive-signed integers to the destination. 

If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the 
register. 

There are legacy and extended fonns of the instruction: 

PMOVZXWD 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXWD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMOVZXWD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41 ] (bit 19) 

VPMOVZXWD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVZXWD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMOVZXWD xmml , xmm2/mem64 66 OF 38 33 /r Zero-extends four packed unsigned 16-bit 

integers in the four low words of xmm2 or 
mem64 and writes four packed positive- 
signed 32-bit integers to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVZXWD xmml, xmm2/mem64 

C4 

RXB.02 

X.1111.0.01 

33/r 

VPMOVZXWD ymml, xmm2/mem128 

C4 

RXB.02 

X.1111.1.01 

33/r 
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Related Instructions 

(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMOVZXWQ Packed Move with Zero-Extension 

VPMOVZXWQ Word to Quadword 

Zero-extends two or four packed 16-bit unsigned integers to 64 bits and writes the packed quadword 
positive signed integers to the destination. 

If the source operand is a register, the 16-bit signed integers are taken from least-significant words of 
the register. 

There are legacy and extended fonns of the instruction: 

PMOVZXWQ 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPMOVZXWQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either an XMM register or a 64-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMOVZXWQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMOVZXWQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMOVZXWQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 

PMOVZXWQ xmml, xmm2/mem32 


Mnemonic 


Opcode 

66 OF 38 34 /r 


Description 

Zero-extends two packed unsigned 16-bit 
integers in the two low words of xmm2 or 
mem32 and writes two packed positive- 
signed 64-bit integers to xmml. 

Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMOVZXWQ xmml, xmm2/mem32 

C4 

RXB.02 

X.1111.0.01 

34 /r 

VPMOVZXWQ ymml, xmm2/mem64 

C4 

RXB.02 

X.1111.1.01 

34 /r 
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Related Instructions 

(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULDQ Packed Multiply 

VPMULDQ Signed Doubleword to Quadword 

Multiplies two or four pairs of 32-bit signed integers in the first and second source operands and 
writes two or four packed quadword signed integer products to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

dest[63:0] = (srcl[31:0] * src2[31:0]) 
dest[127:64] = (srcl [95:64] * src2[95:64]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[191:128] = (srcl [159:128] * src2[159:128]) 
dest[255:192] = (srcl [223:192] * src2[223:192]) 

There are legacy and extended fonns of the instruction: 

PMULDQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMULDQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMULDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PMULDQ xmml, xmm2/mem128 66 OF 38 28 /r Multiplies two packed 32-bit signed integers in 

xmml [31:0] and xmml[95:64] by the 
corresponding values in xmm2 or mem128. 
Writes packed 64-bit signed integer products to 
xmml[63:0] and xmml[127:64], 

Mnemonic Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

VPMULDQ xmml, xmm2, xmm3/mem128 C4 RXB.02 X.srclO.OI 28/r 

VPMULDQ ymml, ymm2, ymm3/mem256 C4 RXB.02 X.srcll.01 28/r 

Related Instructions 

(V)PMULLD, (V)PMULHW, (V)PMULHUW,(V)PMULUDQ, (V)PMULLW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULHRSW Packed Multiply High with Round and Scale 

VPMULHRSW Words 

Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in 
the second source operand, truncates the 32-bit product to the 18 most significant bits by right-shift¬ 
ing, then rounds the truncated value by adding 1 to its least-significant bit. Writes bits [16:1] of the 
sum to the corresponding word of the destination. 


There are legacy and extended forms of the instruction: 

PMULHRSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULHRSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMULHRSW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPMULHRSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULHRSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMULHRSW xmml, xmm2lmem128 66 OF 38 OB /r Multiplies each packed 16-bit signed value in xmml 

by the corresponding value in xmm2 or mem128, 
truncates product to 18 bits, rounds by adding 1. 
Writes bits [16:1] of the sum to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMULHRSW xmml, xmm2, xmm3/mem128 C4 RXB.2 X.srcl. 0.01 OB/r 

VPMULHRSW ymml, ymm2, ymm3/mem256 C4 RXB.2 X.sur/.I.OI OB/r 


Instruction Reference 


PMULHRSW, VPMULHRSW 


419 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Related Instructions 

None 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULHUW Packed Multiply High 

VPMULHUW Unsigned Word 

Multiplies each packed 16-bit unsigned value in the first source operand by the corresponding value 
in the second source operand; writes the high-order 16 bits of each 32-bit product to the correspond¬ 
ing word of the destination. 


There are legacy and extended fonns of the instruction: 

PMULHUW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULHUW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMULHUW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMULHUW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULHUW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMULHUW xmml, xmm2/mem128 66 OF E4 /r Multiplies packed 16-bit unsigned values in xmml by 

the corresponding values in xmm2 or mem128. Writes 
bits [31:16] of each product to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPMULHUW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

E4 /r 

VPMULHUW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

E4 /r 
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Related Instructions 

(V)PMULDQ, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULHW Packed Multiply High 

VPMULHW Signed Word 

Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in 
the second source operand; writes the high-order 16 bits of each 32-bit product to the corresponding 
word of the destination. 


There are legacy and extended fonns of the instruction: 

PMULHW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULHW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMULHW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMULHW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULHW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

PMULHW xmml, xmm2/mem128 66 OF E5 /r 

Mnemonic 

VPMULHW xmml, xmm2, xmm3/mem128 
VPMULHW ymml, ymm2, ymm3/mem256 


Description 

Multiplies packed 16-bit signed values in xmml by the 
corresponding values in xmm2 or mem128. Writes bits 
[31:16] of each product to xmml. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.01 X.sret.0.01 E5/r 

C4 RXB.01 X.sret.1.01 E5/r 
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Related Instructions 

(V)PMULDQ, (V)PMULHUW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULLD Packed Multiply and Store Low 

VPMULLD Signed Doubleword 

Multiplies four packed 32-bit signed integers in the first source operand by the corresponding values 
in the second source operand and writes bits [31:0] of each 64-bit product to the corresponding 32-bit 
element of the destination. 


There are legacy and extended fonns of the instruction: 

PMULLD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULLD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMULLD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPMULLD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULLD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMULLD xmml, xmm2/mem128 66 OF 38 40 /r Multiplies four packed 32-bit signed integers in 

xmml by corresponding values in xmm2 or 
m128. Writes bits [31:0] of each 64-bit product to 
the corresponding 32-bit element of xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMULLD xmml, xmm2, xmm3/mem128 C4 RXB.02 X.srclO.OI 40/r 

VPMULLD ymml, ymm2, ymm3/mem256 C4 RXB.02 X.srcl. 1.01 40/r 
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Related Instructions 

(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULLW Packed Multiply Low 

VPMULLW Signed Word 

Multiplies eight packed 16-bit signed integers in the first source operand by the corresponding values 
in the second source operand and writes bits [15:0] of each 32-bit product to the corresponding 16-bit 
element of the destination. 


There are legacy and extended fonns of the instruction: 

PMULLW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULLW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PMULLW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMULLW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULLW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PMULLW xmml, xmm2/mem128 66 OF D5 /r Multiplies eight packed 16-bit signed integers in 

xmml by corresponding values in xmm2 or 
ml28. Writes bits [15:0] of each 32-bit product to 
the corresponding 16-bit element of xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VPMULLW xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srclO.OI D5/r 

VPMULLW ymml, ymm2, ymm3/mem256 C4 RXB.01 X.srcl. 1.01 D5/r 
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Related Instructions 

(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PMULUDQ Packed Multiply 

VPMULUDQ Unsigned Doubleword to Quadword 

Multiplies two or four pairs of 32-bit unsigned integers in the first and second source operands and 
writes two or four packed quadword unsigned integer products to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest is the destination register - either an XMM register or the corresponding YMM register, 
srcl is the first source operand. src2 is the second source operand. 

dest[63:0] = (srcl[31:0] * src2[31:0]) 
dest[127:64] = (srcl [95:64] * src2[95:64]) 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[191:128] = (srcl [159:128] * src2[159:128]) 
dest[255:192] = (srcl [223:192] * src2[223:192]) 


There are legacy and extended fonns of the instruction: 

PMULUDQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPMULUDQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PMULUDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPMULUDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPMULUDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PMULUDQ xmml, xmm2/mem128 66 OF F4 /r 

Mnemonic 

VPMULUDQ xmml, xmm2, xmm3/mem128 
VPMULUDQ ymml, ymm2, ymm3/mem256 

Related Instructions 


Description 

Multiplies two packed 32-bit unsigned integers in 
xmml[31:0] and xmm1[95:64] by the 
corresponding values in xmm2 or mem128. 
Writes packed 64-bit unsigned integer products to 
xmml[63:0] and xmm1[127:64]. 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.01 X.srcV.0.01 F4/r 

C4 RXB.01 X.srcV.1.01 F4/r 


(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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POR Packed OR 

VPOR 

Performs a bitwise OR of the first and second source operands and writes the result to the destination. 
When one or both of a pair of corresponding bits in the first and second operands are set, the corre¬ 
sponding bit of the destination is set; when neither source bit is set, the destination bit is cleared. 


There are legacy and extended forms of the instruction: 

POR 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPOR 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

POR 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPOR 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPOR 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Mnemonic Opcode 

POR xmml, xmm2lmem128 66 OF EB/r 

Mnemonic 

VPOR xmml, xmm2, xmm3/mem128 
VPOR ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)PAND, (V)PANDN, (V)PXOR 


Description 

bitwise OR of values in xmml and xmm2 or 
Writes results to xmml. 

Encoding 

RXB.mapselect W.vvvv.L.pp Opcode 

RXB.01 X.src7.0.01 EB/r 

RXB.01 X.srct.l .01 EB/r 


Performs 
mem 128. 

VEX 

C4 

C4 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSADBW Packed Sum of Absolute Differences 

VPSADBW Bytes to Words 

Subtracts the 16 or 32 packed 8-bit unsigned integers in the second source operand from the corre¬ 
sponding values in the first source operand and computes the absolute value of the differences. Com¬ 
putes two or four unsigned 16-bit integer sums of groups of eight absolute differences and writes the 
sums to specific words of the destination. 

For the 128-bit form of the instruction: 

• The unsigned 16-bit integer sum of absolute differences of the eight bytes [7:0] of the source 
operands is written to bits [15:0] of the destination; bits [63:16] are cleared. 

• The unsigned 16-bit integer sum of absolute differences of the eight bytes [15:8] of the source 
operands is written to bits [79:64] of the destination; bits [127:80] are cleared. 

Additionally, for the 256-bit form of the instruction: 

• The unsigned 16-bit integer sum of absolute differences of the eight bytes [23:16] of the source 
operands is written to bits [143:128] of the destination; bits [191:144] are cleared. 

• The unsigned 16-bit integer sum of absolute differences of the eight bytes [24:31] of the source 
operands is written to bits [207:192] of the destination; bits [255:208] are cleared. 

There are legacy and extended forms of the instruction: 

PSADBW 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPSADBW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSADBW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSADBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSADBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PSADBW xmml, xmm2lmem128 66 OF F6 /r 

Mnemonic 

VPSADBW xmml, xmm2, xmm3/mem128 
VPSADBW ymml, ymm2, ymm3/mem256 

Related Instructions 

(V)MPSADBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


Description 

Compute the sum of the absolute differences of two sets 
of packed 8-bit unsigned integer values in xmml and 
xmm2 or mem128. Writes 16-bit unsigned integer sums 
to xmml 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.01 X.srcl. 0.01 F6/r 

C4 RXB.01 X.srcl. 1.01 F6/r 
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PSHUFB Packed Shuffle 

VPSHUFB Byte 

Copies bytes from the first source operand to the destination or clears bytes in the destination, as 
specified by control bytes in the second source operand. 

The control bytes occupy positions in the source operand that correspond to positions in the destina¬ 
tion. Each control byte has the following fields. 


7 

6 4 

3 


0 

FRZ 

Reserved 

SRCIndex 


Bits 

Description 

[7] 

Set the bit to clear the corresponding byte of the destination. 

Clear the bit to copy the selected source byte to the corresponding byte of the destination. 

[6:4] 

Reserved 

[3:0] 

Binary value selects the source byte. 


For the 256-bit form of the instruction, the SRCIndex fields in the upper 16 bytes of the second 
source operand select bytes in the upper 16 bytes of the first source operand to be copied. 

There are legacy and extended forms of the instruction: 

PSHUFB 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPSHUFB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSHUFB 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPSHUFB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSHUFB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSHUFB xmml, xmm2lmem128 66 OF 38 00 /r Moves bytes in xmml as specified by control bytes in 

xmm2 or mem 128. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSHUFB xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl.0.01 

00 /r 

VPSHUFB ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

00 It 


Related Instructions 

(V)PSHUFD, (V)PSHUFW, (V)PSHUHW, (V)PSHUFLW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSHUFD Packed Shuffle 

VPSHUFD Doublewords 


Copies packed doubleword values from a source to a doubleword in the destination, as specified by 
bit fields of an immediate byte operand. A source doubleword can be copied more than once. 

Source doublewords are selected by two-bit fields in the immediate-byte operand. Each field corre¬ 
sponds to a destination doubleword, as shown: 


Destination 

Doubleword 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Doubleword 

[31:0] 

[1:0] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[63:32] 

[3:2] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[95:64] 

[5:4] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[127:96] 

[7:6] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 


For the 256-bit form of the instruction, the same immediate byte selects doublewords in the upper 
128-bits of the source operand to be copied to the destination. 


Destination 

Doubleword 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Doubleword 

[159:128] 

[1:0] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[225:224] 

[191:160] 

[3:2] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[225:224] 
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Destination 

Doubleword 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Doubleword 

[223:192] 

[5:4] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[225:224] 

[255:224] 

[7:6] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[225:224] 


There are legacy and extended forms of the instruction: 

PSHUFD 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPSHUFD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either a YMM register or a 256-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSHUFD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSHUFD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSHUFD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSHUFD xmml, xmm2lmem128, imm8 66 OF 70 /r ib Copies packed 32-bit values from xmm2 or 

mem128 to xmml, as specified by imm8. 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPSHUFD xmml, xmm2/mem128, imm8 C4 RXB.01 X.1111.0.01 70/r ib 

VPSHUFD ymm1,ymm2/mem256, imm8 C4 RXB.01 X.1111.1.01 70/r ib 

Related Instructions 

(V)PSHUFHW, (V)PSHUFLW, (V)PSHUFW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.vvvv I = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSHUFHW Packed Shuffle 

VPSHUFHW High Words 

Copies packed word values from the high quadword of the source operand or the upper quadwords of 
two halves of the source operand to a word in the high quadword of the destination or the upper quad- 
words of two halves of the destination, as specified by bit fields of an immediate byte operand. A 
source word can be copied more than once. 

Source words are selected by two-bit fields in the immediate-byte operand. Each field corresponds to 
a destination word, as shown: 


Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[79:64] 

[1:0] 

00 

[79:64] 



01 

[95:80] 



10 

[111:96] 



11 

[127:112] 

[95:80] 

[3:2] 

00 

[79:64] 



01 

[95:80] 



10 

[111:96] 



11 

[127:112] 

[111:96] 

[5:4] 

00 

[79:64] 



01 

[95:80] 



10 

[111:96] 



11 

[127:112] 

[127:112] 

[7:6] 

00 

[79:64] 



01 

[95:80] 



10 

[111:96] 



11 

[127:112] 


The least-significant quadword of the source is copied to the corresponding quadword of the destina¬ 
tion. 

For the 256-bit form of the instruction, the same immediate byte selects words in the most-significant 
quadword of the source operand to be copied to the destination: 


Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[207:192] 

[1:0] 

00 

[207:192] 



01 

[223:208] 



10 

[239:224] 



11 

[255:240] 
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Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[223:208] 

[3:2] 

00 

[207:192] 



01 

[223:208] 



10 

[239:224] 



11 

[255:240] 

[239:224] 

[5:4] 

00 

[207:192] 



01 

[223:208] 



10 

[239:224] 



11 

[255:240] 

[255:240] 

[7:6] 

00 

[207:192] 



01 

[223:208] 



10 

[239:224] 



11 

[255:240] 


The least-significant quadword of the upper 128 bits of the source is copied to the corresponding 
quadword of the destination. 


There are legacy and extended fonns of the instruction: 

PSHUFHW 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPSHUFHW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either a YMM register or a 256-bit memory location. The destination is a 
YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSHUFHW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSHUFHW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSHUFHW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSHUFHW xmml, xmm2lmem128, imm8 F3 OF 70 /r ib Copies packed 16-bit values from the 

high-order quadword of xmm2 or mem128 
to the high-order quadword of xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSHUFHW xmml, xmm2/mem128, imm8 

C4 

RXB.01 

X.1111.0.10 

70 /r ib 

VPSHUFHW ymml, ymm2/mem256, imm8 

C4 

RXB.01 

X.1111.1.10 

70 It ib 


Related Instructions 

(V)PSHUFD, (V)PSHUFLW, (V)PSHUFW 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.vvvv I = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSHUFLW Packed Shuffle 

VPSHUFLW Low Words 

Copies packed word values from the low quadword of the source operand or the lower quadwords of 
two halves of the source operand to a word in the low quadword of the destination or the lower quad- 
words of two halves of the destination, as specified by bit fields of an immediate byte operand. A 
source word can be copied more than once. 

Source words are selected by two-bit fields in the immediate-byte operand. Each bit field corresponds 
to a destination word, as shown: 


Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[15:0] 

[1:0] 

00 

[15:0] 



01 

[31:16] 



10 

[47:32] 



11 

[63:48] 

[31:16] 

[3:2] 

00 

[15:0] 



01 

[31:16] 



10 

[47:32] 



11 

[63:48] 

[47:32] 

[5:4] 

00 

[15:0] 



01 

[31:16] 



10 

[47:32] 



11 

[63:48] 

[63:48] 

[7:6] 

00 

[15:0] 



01 

[31:16] 



10 

[47:32] 



11 

[63:48] 


The most-significant quadword of the source is copied to the corresponding quadword of the destina¬ 
tion. 

For the 256-bit form of the instruction, the same immediate byte selects words in the lower quadword 
of the upper 128 bits of the source operand to be copied to the destination: 


Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[143:128] 

[1:0] 

00 

[143:128] 



01 

[159:144] 



10 

[175:160] 



11 

[191:176] 
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Destination 

Word 

Immediate-Byte 

Bit Field 

Value of 

Bit Field 

Source 

Word 

[159:144] 

[3:2] 

00 

[143:128] 



01 

[159:144] 



10 

[175:160] 



11 

[191:176] 

[175:160] 

[5:4] 

00 

[143:128] 



01 

[159:144] 



10 

[175:160] 



11 

[191:176] 

[191:176] 

[7:6] 

00 

[143:128] 



01 

[159:144] 



10 

[175:160] 



11 

[191:176] 


The most-significant quadword of the upper 128 bits of the source is copied to the corresponding 
quadword of the destination. 

There are legacy and extended fonns of the instruction: 

PSHUFLW 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not 
affected. 

VPSHUFLW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is either an XMM register or a 128-bit memory location. The destination is an 
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The source operand is either a YMM register or a 256-bit memory location. The destination is a 
YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSHUFLW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSHUFLW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSHUFLW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSHUFLW xmml, xmm2lmem128, imm8 F2 OF 70 /r ib Copies packed 16-bit values from the low- 

order quadword of xmm2 or mem128 to 
the low-order quadword of xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSHUFLW xmml, xmm2/mem128, imm8 

C4 

RXB.01 

X.1111.0.11 

70 /rib 

VPSHUFLW ymml, ymm2/mem256, imm8 

C4 

RXB.01 

X.1111.1.11 

70 /rib 


Related Instructions 

(V)PSHUFD, (V)PSHUFHW, (V)PSHUFW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

VEX.vvvv I = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSIGNB Packed Sign 

VPSIGNB Byte 

For each packed signed byte in the first source operand, evaluate the corresponding byte of the second 
source operand and perform one of the following operations. 

• When a byte of the second source is negative, write the two’s-complement of the corresponding 
byte of the first source to the destination. 

• When a byte of the second source is positive, copy the corresponding byte of the first source to the 
destination. 

• When a byte of the second source is zero, clear the corresponding byte of the destination. 


There are legacy and extended forms of the instruction: 

PSIGNB 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPSIGNB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSIGNB 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPSIGNB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSIGNB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode 

PSIGNB xmml, xmm2lmem128 66 OF 38 08 /r 


Mnemonic 

VPSIGNB xmml, xmm2, xmm2/mem128 
VPSIGNB ymml, ymm2, ymm2/mem256 


Description 

Perform operation based on evaluation of each packed 
8-bit signed integer value in xmm2 or mem128. 

Write 8-bit signed results to xmml. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.02 X.srcl. 0.01 08/r 

C4 RXB.02 X.src7.1.01 08/r 


Related Instructions 

(V)PSIGNW, (V)PSIGND 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SS£, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSIGND Packed Sign 

VPSIGND Doubleword 

For each packed signed doubleword in the first source operand, evaluate the corresponding double- 

word of the second source operand and perform one of the following operations. 

• When a doubleword of the second source is negative, write the two’s-complement of the 
corresponding doubleword of the first source to the destination. 

• When a doubleword of the second source is positive, copy the corresponding doubleword of the 
first source to the destination. 

• When a doubleword of the second source is zero, clear the corresponding doubleword of the 
destination. 


There are legacy and extended forms of the instruction: 

PSIGND 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPSIGND 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSIGND 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPSIGND 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSIGND 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSIGND xmml, xmm2lmem128 66 OF 38 OA /r Perform operation based on evaluation of each packed 

32-bit signed integer value in xmm2 or mem128. 

Write 32-bit signed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSIGND xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcl.0.01 

OA It 

VPSIGND ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcl. 1.01 

OA /r 


Related Instructions 

(V)PSIGNB, (V)PSIGNW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSIGNW Packed Sign 

VPSIGNW Word 

For each packed signed word in the first source operand, evaluate the corresponding word of the sec¬ 
ond source operand and perform one of the following operations. 

• When a word of the second source is negative, write the two’s-complement of the corresponding 
word of the first source to the destination. 

• When a word of the second source is positive, copy the corresponding word of the first source to 
the destination. 

• When a word of the second source is zero, clear the corresponding word of the destination. 


There are legacy and extended forms of the instruction: 

PSIGNW 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPSIGNW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSIGNW 

SSSE3 

CPUID Fn0000_0001_ECX[SSSE3] (bit 9) 

VPSIGNW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSIGNW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

PSIGNW xmml, xmm2lmem128 66 OF 38 09 /r Perform operation based on evaluation of each packed 

16-bit signed integer value in xmm2 or mem128. 

Write 16-bit signed results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSIGNW xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

X.srcf.0.01 

09 /r 

VPSIGNW ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

X.srcf.1.01 

09 /r 


Related Instructions 

(V)PSIGNB, (V)PSIGND 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSLLD Packed Shift Left Logical 

VPSLLD Doublewords 

Left-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

Low-order bits emptied by shifting are cleared. When the shift count is greater than 31, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSLLD 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSLLD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSLLD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSLLD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSLLD 256-bit 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

Opcode 


Description 


PSLLD xmml, xmm2lmem128 

66 OF F2 It 

Left-shifts packed doublewords in xmml as specified 
by xmm2[63:0] or meml28(63:0], 

PSLLD xmm, imm8 

66 OF 72 16 ib 

Left-shifts packed doublewords in xmm as specified by 
imm8. 

Mnemonic 



Encoding 




VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

VPSLLD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 X.srcl. 0.01 

F2 /r 

VPSLLD xmml, xmm2, imm8 


C4 

RXB.01 X.dest.0.01 

72 16 ib 

VPSLLD ymml, ymm2, xmm3/mem128 

C4 

RXB.01 X.srcl. 1.01 

F2 /r 

VPSLLD ymml, ymm2, imm8 


C4 

RXB.01 X.dest.1.01 

72 16 ib 


Related Instructions 

(V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Instruction Reference 


PSLLD, VPSLLD 


453 



AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSLLDQ Packed Shift Left Logical 

VPSLLDQ Double Quadword 


Left-shifts the one or each of the two double quadword values in the source operand the number of 
bytes specified by an immediate byte operand and writes the shifted values to the destination. 

The immediate byte operand supplies an unsigned shift count. Low-order bytes emptied by shifting 
are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit fonn of 
the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes 
shifted out of the lower 128 bits are not shifted into the upper. 


There are legacy and extended forms of the instruction: 

PSLLDQ 

The source XMM register is also the destination. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are not affected. 

VPSLLDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv. 
Bits [255:128] of the YMM register that conesponds to the destination are cleared. 

YMM Encoding 

The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSLLDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSLLDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSLLDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

PSLLDQ xmm, imm8 66 OF 73 /7 ib Left-shifts double quadword value in xmml as specified by imm8. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSLLDQ xmml, xmm2, imm8 

C4 

RXB.01 

O.dest.O.OI 

73 17 ib 

VPSLLDQ ymml, ymm2, imm8 

C4 

RXB.01 

O.dest.1.01 

73 /7 ib 


Related Instructions 

(V)PSLLD, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, 
(V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSLLQ Packed Shift Left Logical 

VPSLLQ Quadwords 

Left-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

Low-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSLLQ 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSLLQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSLLQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSLLQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSLLQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PSLLQ xmml, xmm2/mem128 66 OF F3 /r 

PSLLQ xmm, imm8 66 OF 73 /6 ib 

Mnemonic 

VPSLLQ xmml, xmm2, xmm3/mem128 
VPSLLQ xmml, xmm2, imm8 
VPSLLQ ymml, ymm2, xmm3/mem128 
VPSLLQ ymml, ymm2, imm8 

Related Instructions 


Description 

Left-shifts packed quadwords in xmml as specified by 
xmm2[63:0] or meml28(63:0], 

Left-shifts packed quadwords in xmm as specified by 


imm8. 


Encoding 


VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srclO.OI 

F3/r 

C4 

RXB.01 

X.dest.0.01 

73 16 ib 

C4 

RXB.01 

X.srcl. 1.01 

F3/r 

C4 

RXB.01 

X.dest.1.01 

73 16 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQLLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSLLW Packed Shift Left Logical 

VPSLLW Words 

Left-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

Low-order bits emptied by shifting are cleared. When the shift count is greater than 15, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSLLW 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSLLW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSLLW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSLLW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSLLW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PSLLW xmml, xmm2/mem128 66 OF FI /r 

PSLLW xmm, imm8 66 OF 71 16 ib 

Mnemonic 

VPSLLW xmml, xmm2, xmm3/mem128 
VPSLLW xmml, xmm2, imm8 
VPSLLW ymml, ymm2, xmm3/mem128 
VPSLLW ymml, ymm2, imm8 

Related Instructions 


Description 

Left-shifts packed words in xmml as specified by 
xmm2[63:0] or mem128[63:0]. 

Left-shifts packed words in xmm as specified by imm8. 

Encoding 


VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl.0.01 

FI /r 

C4 

RXB.01 

X.dest.0.01 

71 16 ib 

C4 

RXB.01 

X.srcl.1.01 

FI /r 

C4 

RXB.01 

X.dest.1.01 

71 16 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSRAD Packed Shift Right Arithmetic 

VPSRAD Doublewords 

Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift 
value is greater than 31, each doubleword of the destination is filled with the sign bit of its initial 
value. 


There are legacy and extended forms of the instruction: 

PSRAD 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSRAD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSRAD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRAD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRAD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PS RAD xmml, xmm2lmem128 66 OF E2/r 

PSRAD xmm, imm8 66 OF 72 14 ib 

Mnemonic 

VPS RAD xmml, xmm2, xmm3/mem128 
VPS RAD xmml, xmm 2, imm8 
VPS RAD ymml, ymm2, xmm3/mem128 
VPS RAD ymml, ymm2, imm8 

Related Instructions 


Description 

Right-shifts packed doublewords in xmml as specified 
by xmm2[63:0] or mem128[63:0]. 

Right-shifts packed doublewords in xmm as specified 
by imm8. 

Encoding 


VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srclO.OI 

E2 /r 

C4 

RXB.01 

X.dest.0.01 

72 14 ib 

C4 

RXB.01 

X.srcll.01 

E2 /r 

C4 

RXB.01 

X.dest.1.01 

72 14 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PS RAW Packed Shift Right Arithmetic 

VPSRAW Words 

Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift 
value is greater than 16, each doubleword of the destination is filled with the sign bit of its initial 
value. 


There are legacy and extended forms of the instruction: 

PS RAW 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSRAW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSRAW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRAW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRAW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


466 


PSRAW, VPSRAW 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Instruction Encoding 

Mnemonic Opcode 

PS RAW xmml, xmm2lmem128 66 OF El /r 

PSRAW xmm, imm8 66 OF 71 14 ib 

Mnemonic 

VPS RAW xmml, xmm2, xmm3/mem128 
VPSRAW xmml, xmm2, imm8 
VPS RAW ymml, ymm2, xmm3/mem128 
VPSRAW ymml, ymm2, imm8 

Related Instructions 


Description 

Right-shifts packed words in xmml as specified by 
xmm2[63:0] or mem128[63:0]. 

Right-shifts packed words in xmm as specified by 


imm8. 


Encoding 


VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl. 0.01 

El It 

C4 

RXB.01 

X.dest.0.01 

71 14 ib 

C4 

RXB.01 

X.srcl. 1.01 

El It 

C4 

RXB.01 

X.dest.1.01 

71 14 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRLD, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSRLD Packed Shift Right Logical 

VPSRLD Doublewords 

Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

High-order bits emptied by shifting are cleared. When the shift value is greater than 31, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSRLD 

There are two forms of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSRLD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSRLD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRLD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRLD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PSRLD xmml, xmm2/mem128 66 OF D2/r 

PSRLD xmm, imm8 66 OF 72 12 ib 

Mnemonic 

VPSRLD xmml, xmm2, xmm3/mem128 
VPSRLD xmml, xmm2, imm8 
VPSRLD ymml, ymm2, xmm3/mem128 
VPSRLD ymml, ymm2, imm8 

Related Instructions 


Description 

Right-shifts packed doublewords in xmml as specified 
by xmm2[63:0] or meml28(63:0], 

Right-shifts packed doublewords in xmm as specified 
by imm8. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srclO.OI 

D2 /r 

C4 

RXB.01 

X.dest.0.01 

72 12 ib 

C4 

RXB.01 

X.srcl. 1.01 

D2 /r 

C4 

RXB.01 

X.dest.1.01 

72 12 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLDQ, 
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSRLDQ Packed Shift Right Logical 

VPSRLDQ Double Quadword 

Right-shifts one or each of two double quadword values in the source operand the number of bytes 
specified by an immediate byte operand and writes the shifted values to the destination. 

The immediate byte operand supplies an unsigned shift count. High-order bytes emptied by shifting 
are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit fonn of 
the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes 
shifted out of the upper 128 bits are not shifted into the lower. 


There are legacy and extended forms of the instruction: 

PSRLDQ 

The source XMM register is also the destination. Bits [255:128] of the YMM register that corre¬ 
sponds to the destination are not affected. 

VPSRLDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv. 
Bits [255:128] of the YMM register that conesponds to the destination are cleared. 

YMM Encoding 

The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSRLDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRLDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRLDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSRLDQ xmm, imm8 66 OF 73 13 ib Right-shifts double quadword value in xmml as specified by 

imm8. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSRLDQ xmml, xmm2, imm8 

C4 

RXB.01 

X.dest.0.01 

73 13 ib 

VPSRLDQ ymml, ymm2, imm8 

C4 

RXB.01 

X.dest.1.01 

73 13 ib 
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Related Instructions 

(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLQ, 
(V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSRLQ Packed Shift Right Logical 

VPSRLQ Quadwords 

Right-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

High-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSRLQ 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSRLQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSRLQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRLQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRLQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


474 


PSRLQ, VPSRLQ 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Instruction Encoding 

Mnemonic Opcode 

PSRLQ xmml, xmm2lmem128 66 OF D3/r 

PSRLQ xmm, imm8 66 OF 73 12 ib 

Mnemonic 

VPSRLQ xmml, xmm2, xmm3/mem128 
VPSRLQ xmml, xmm2, imm8 
VPSRLQ ymml, ymm2, xmm3/mem128 
VPSRLQ ymml, ymm2, imm8 

Related Instructions 


Description 

Right-shifts packed quadwords in xmml as specified 
by xmm2[63:0] or meml28(63:0], 

Right-shifts packed quadwords in xmm as specified by 


imm8. 


Encoding 


VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl. 0.01 

D3 It 

C4 

RXB.01 

X.dest.0.01 

73 12 ib 

C4 

RXB.01 

X.srcl AM 

D3 /r 

C4 

RXB.01 

X.dest.1.01 

73 12 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSRLW Packed Shift Right Logical 

VPSRLW Words 

Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and 
writes the shifted values to the destination. 

The shift-count operand can be an immediate byte, a second register, or a memory location. The shift 
count is treated as an unsigned integer. When the shift count is provided by a register or memory loca¬ 
tion, only bits [63:0] of the value are considered. 

High-order bits emptied by shifting are cleared. When the shift value is greater than 15, the destina¬ 
tion is cleared. 


There are legacy and extended fonns of the instruction: 

PSRLW 

There are two fonns of the instruction, based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM reg¬ 
ister is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are 
not affected. 

VPSRLW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

There are two 128-bit encodings. These differ based on the type of count operand. 

The first source operand is an XMM register. The shift count is specified by either a second XMM 
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM 
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

There are two 256-bit encodings. These differ based on the type of count operand. 

The first source operand is a YMM register. The shift count is specified by either a second XMM reg¬ 
ister or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM reg¬ 
ister. For the immediate operand encoding, the destination is specified by VEX.vvvv. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSRLW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSRLW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSRLW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode 

PSRLW xmml, xmm2/mem128 66 OF D1 /r 

PSRLW xmm, imm8 66 OF 71 12 ib 

Mnemonic 

VPSRLW xmml, xmm2, xmm3/mem128 
VPSRLW xmml, xmm2, imm8 
VPSRLW ymml, ymm2, xmm3/mem128 
VPSRLW ymml, ymm2, imm8 

Related Instructions 


Description 

Right-shifts packed words in xmml as specified by 
xmm2[63:0] or mem128[63:0]. 

Right-shifts packed words in xmm as specified by 


imm8. 


Encoding 


VEX 

RXB.mapselect W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl. 0.01 

D1 /r 

C4 

RXB.01 

X.dest.0.01 

71 12 ib 

C4 

RXB.01 

X.srcl. 1.01 

D1 /r 

C4 

RXB.01 

X.dest.1.01 

71 12 ib 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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PSUBB Packed Subtract 

VPSUBB Bytes 

Subtracts 16 or 32 packed 8-bit integer values in the second source operand from the corresponding 
values in the first source operand and writes the integer differences to the corresponding bytes of the 
destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written to the destination. 


There are legacy and extended forms of the instruction: 

PSUBB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPSUBB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBB xmml, xmm2lmem128 66 OF F8 /r Subtracts 8-bit signed integer values in xmm2 or 

mem128 from corresponding values in xmml. 
Writes differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

F8 /r 

VPSUBB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

F8 /r 
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Related Instructions 

(V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBD Packed Subtract 

VPSUBD Doublewords 

Subtracts four or eight packed 32-bit integer values in the second source operand from the corre¬ 
sponding values in the first source operand and writes the integer differences to the corresponding 
doubleword of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written to the destination. 


There are legacy and extended forms of the instruction: 

PSUBD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VSUBD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBD xmml, xmm2lmem128 66 OF FA /r Subtracts packed 32-bit integer values in xmm2 or 

mem128 from corresponding values in xmml. Writes the 
differences to xmml 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSUBD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

FA/r 

VPSUBD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

FA/r 
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Related Instructions 

(V)PSUBB, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBQ Packed Subtract 

VPSUBQ Quadword 


Subtracts two or four packed 64-bit integer values in the second source operand from the correspond¬ 
ing values in the first source operand and writes the differences to the corresponding quadword of the 
destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written to the destination. 


There are legacy and extended forms of the instruction: 

PSUBQ 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VSUBQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBQ xmml, xmm2lmem128 66 OF FB /r Subtracts packed 64-bit integer values in xmm2 or 

mem128 from corresponding values in xmml. Writes the 
differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBQ xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

FB It 

VPSUBQ ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

FB It 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBSB Packed Subtract Signed With Saturation 

VPSUBSB Bytes 

Subtracts 16 or 32 packed 8-bit signed integer value in the second source operand from the corre¬ 
sponding values in the first source operand and writes the signed integer differences to the corre¬ 
sponding byte of the destination. 

For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is 
saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 
80h. 

There are legacy and extended forms of the instruction: 

PSUBSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPSUBSB 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PSUBSB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBSB xmml, xmm2/mem128 66 OF E8 /r Subtracts packed 8-bit signed integer values in xmm2 or 

mem128 from corresponding values in xmml. Writes the 
differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBSB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

E8/r 

VPSUBSB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

E8/r 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBSW Packed Subtract Signed With Saturation 

VPSUBSW Words 

Subtracts eight or sixteen packed 16-bit signed integer values in the second source operand from the 
corresponding values in the first source operand and writes the signed integer differences to the corre¬ 
sponding word of the destination. 

Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h 
are saturated to 8000h. 


There are legacy and extended forms of the instruction: 

PSUBSW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPSUBSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBSW xmml, xmm2lmem128 66 OF E9 Ir Subtracts packed 16-bit signed integer values in xmm2 or 

mem128 from corresponding values in xmml. Writes the 
differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBSW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

E9/r 

VPSUBSW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

E9/r 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBUSB Packed Subtract Unsigned With Saturation 

VPSUBUSB Bytes 

Subtracts 16 or 32 packed 8-bit unsigned integer value in the second source operand from the corre¬ 
sponding values in the first source operand and writes the unsigned integer difference to the corre¬ 
sponding byte of the destination. 

Differences less than OOh are saturated to OOh. 


There are legacy and extended forms of the instruction: 

PSUBUSB 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPSUBUSB 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBUSB 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBUSB 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBUSB 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBUSB xmml, xmm2lmem128 66 OF D8 /r Subtracts packed byte unsigned integer values in 

xmm2 or meml28 from corresponding values in xmml. 
Writes the differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBUSB xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

D8/r 

VPSUBUSB ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

D8/r 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSW, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


Instruction Reference 


PSUBUSB, VPSUBUSB 


491 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


PSUBUSW Packed Subtract Unsigned With Saturation 

VPSUBUSW Words 

Subtracts eight or sixteen packed 16-bit unsigned integer value in the second source operand from the 
corresponding values in the first source operand and writes the unsigned integer differences to the 
corresponding word of the destination. 

Differences less than OOOOh are saturated to OOOOh. 


There are legacy and extended forms of the instruction: 

PSUBUSW 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPSUBUSW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBUSW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBUSW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBUSW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBUSW xmml , xmm2lmem128 66 OF D9 /r Subtracts packed 16-bit unsigned integer values in 

xmm2 or mem128 from corresponding values in 
xmml. Writes the differences to xmml 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSUBUSW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srct.0.01 

D9/r 

VPSUBUSW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

D9/r 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PSUBW Packed Subtract 

VPSUBW Words 


Subtracts eight or sixteen packed 16-bit integer values in the second source operand from the corre¬ 
sponding values in the first source operand and writes the integer differences to the corresponding 
word of the destination. 

This instruction operates on both signed and unsigned integers. When a result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written to the destination. 


There are legacy and extended forms of the instruction: 

PSUBW 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VPSUBW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PSUBW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPSUBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPSUBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PSUBW xmml, xmm2lmem128 66 OF F9 /r Subtracts packed 16-bit integer values in xmm2 or 

mem128 from corresponding values in xmml. Writes the 
differences to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSUBW xmml , xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

F9 /r 

VPSUBW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcll.01 

F9 /r 
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Related Instructions 

(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

s 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PTEST Packed Bit Test 

VPTEST 

First, performs a bitwise AND of the first source operand with the second source operand. 

Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. 

Second, perfonns a bitwise AND of the second source operand with the logical complement (NOT) 
of the first source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. 

Neither source operand is modified. 


There are legacy and extended fonns of the instruction: 

PTEST 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. 

VPTEST 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is a YMM register or 256-bit 
memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

PTEST 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPTEST 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PTEST xmml, xmm2/mem128 66 OF 38 17 /r Set ZF if bitwise AND of xmm2/m128 with xmml = 0; 

else, clear ZF. 

SetCF if bitwise AND of xmm2/m128 with NOT xmml = 0; 
else, clear CF. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPTEST xmml, xmm2/mem128 

C4 

RXB.00010 

X.1111.0.01 

17 It 

VPTEST ymml, ymm2/mem256 

C4 

RXB.00010 

X.1111.1.01 

17 It 


Related Instructions 

VTESTPD, VTESTPS 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




0 

M 

0 



21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15,5,3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined 

flags are U. 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


PTEST, VPTEST 


497 









AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


PUNPCKHBW 

VPUNPCKHBW 


Unpack and Interleave 
High Bytes 


Unpacks the 8 high-order bytes of each octword the first and second source operands and interleaves 
the bytes as they are copied to the destination. The low-order bytes of each octword of the source 
operands are ignored. 

Bytes are interleaved in ascending order from the least-significant byte of the upper 8 bytes of each 
octword of the source operands with bytes from the first source operand occupying the lower byte of 
each pair copied to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[7:0] = srcl [71:64] 
dest[15:8] = src2[71:64] 
dest[23:16] = srcl [79:72] 
dest[31:24] = src2[79:72] 
dest[39:32] = srcl [87:80] 
dest[47:40] = src2[87:80] 
dest[55:48] = srcl [95:88] 
dest[63:56] = src2[95:88] 
dest[71:64] = srcl [103:96] 
dest[79:72] = src2[103:96] 
dest[87:80] = srcl [111:104] 
dest[95:88] = src2[111:104] 
dest[103:96] = srcl [119:112] 
dest[111:104] = src2[119:112] 
dest[119:112] = srcl [127:120] 
dest[127:120] = src2[127:120] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 


dest[135:128] 
dest[143:136] 
dest[151:144] 
dest[159:152] 
dest[167:160] 
dest[175:168] 
dest[183:176] 
dest[191:184] 
dest[199:192] 
dest[207:200] 
dest[215:208] 
dest[223:216] 
dest[231:224] 
dest[239:232] 
dest[247:240] 
dest[255:248] 


srcl [199:192] 
src2[199:192] 
srcl [207:200] 
src2[207:200] 
srcl [215:208] 
src2[215:208] 
srcl [223:216] 
src2[223:216] 
srcl [231:224] 
src2[231:224] 
srcl [239:232] 
src2[239:232] 
srcl [247:240] 
src2[247:240] 
srcl [255:248] 
src2[255:248] 


When the second source operand is all 0s, the destination effectively contains the 8 high-order bytes 
from the first source operand or the 8 high-order bytes from both octwords of the first source operand 
zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 
16-bit operands for subsequent processing that requires higher precision. 
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There are legacy and extended fonns of the instruction: 

PUNPCKHBW 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source operand is also the destination register. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPUNPCKHBW 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKHBW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKHBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKHBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKHBW xmml, xmm2lmem128 66 OF 68 /r Unpacks and interleaves the high-order bytes of 

xmml and xmm2 or mem 128. Writes the bytes to 
xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKHBW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

68 It 

VPUNPCKHBW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcll.01 

68 /r 


Related Instructions 

(V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, 
(V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKHDQ Unpack and Interleave 

VPUNPCKHDQ High Doublewords 

Unpacks the two high-order doublewords of each octword of the first and second source operands and 
interleaves the doublewords as they are copied to the destination. The low-order doublewords of each 
octword of the source operands are ignored. 

Doublewords are interleaved in ascending order from the least-significant doubleword of the high 
quadword of each octword with doublewords from the first source operand occupying the lower dou¬ 
bleword of each pair copied to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[31:0] = srcl [95:64] 
dest[63:32] = src2[95:64] 
dest[95:64] = srcl [127:96] 
dest[127:96] = src2[127:96] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[159:128] = srcl [223:192] 
dest[191:160] = src2[223:192] 
dest[223:192] = srcl [255:224] 
dest[255:224] = src2[255:224] 

When the second source operand is all Os, the destination effectively receives the 2 high-order dou¬ 
blewords from the first source operand or the 2 high-order doublewords from both octwords of the 
first source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit 
values to unsigned 64-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended forms of the instruction: 

PUNPCKHDQ 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPUNPCKHDQ 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Reference 


PUNPCKHDQ, VPUNPCKHDQ 


501 



AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKHDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKHDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKHDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKHDQ xmml, xmm2lmem128 66 OF 6A/r Unpacks and interleaves the high-order doublewords 

of xmml and xmm2 or mem128. Writes the 
doublewords to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKHDQ xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

6A It 

VPUNPCKHDQ ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

6A It 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, 
(V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 


Instruction Reference 


PUNPCKHDQ, VPUNPCKHDQ 


503 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


PUNPCKHQDQ Unpack and Interleave 

VPUNPCKHQDQ High Quadwords 

Unpacks the high-order quadword of each octword of the first and second source operands and inter¬ 
leaves the quadwords as they are copied to the destination. The low-order quadword of each octword 
of the source operands is ignored. 

Quadwords are interleaved in ascending order with the high-order quadword from the first source 
operand or each octword of the first source operand occupying the lower quadword of corresponding 
octword of the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[63:0] = srcl [127:64] 
dest[127:64] = src2[127:64] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[191:128] = srcl [255:192] 
dest[255:192] = src2[255:192] 

When the second source operand is all Os, the destination effectively receives the quadword from 
upper half of the first source operand or the high-order quadwords from each octword of the first 
source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit val¬ 
ues to unsigned 128-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended fonns of the instruction: 

PUNPCKHQDQ 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPUNPCKHQDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKHQDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKHQDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKHQDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKHQDQ xmml, xmm2lmem128 66 OF 6D /r Unpacks and interleaves the high-order 

quadwords of xmml and xmm2 or mem128. 
Writes the bytes to xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPUNPCKHQDQ xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

6D It 

VPUNPCKHQDQ ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

6D/r 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, 
(V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKHWD Unpack and Interleave 

VPUNPCKHWD High Words 


Unpacks the 4 high-order words of each octword of the first and second source operands and inter¬ 
leaves the words as they are copied to the destination. The low-order words of each octword of the 
source operands are ignored. 

Words are interleaved in ascending order from the least-significant word of the high quadword of 
each octword with words from the first source operand occupying the lower word of each pair copied 
to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[15:0] = srcl [79:64] 
dest[31:16] = src2[79:64] 
dest[47:32] = srcl [95:80] 
dest[63:48] = src2[95:80] 
dest[79:64] = srcl [111:96] 
dest[95:80] = src2[111:96] 
dest[111:96] = srcl [127:112] 
dest[127:112] = src2[127:112] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 


dest[143:128] 
dest[159:144] 
dest[175:160] 
dest[191:176] 
dest[207:192] 
dest[223:208] 
dest[239:224] 
dest[255:240] 


srcl [207:192] 
src2[207:192] 
srcl [223:208] 
src2[223:208] 
srcl [239:224] 
src2[239:224] 
srcl [255:240] 
src2[255:240] 


When the second source operand is all 0s, the destination effectively receives the 4 high-order words 
from the first source operand or the 4 high-order words from both octwords of the first source oper¬ 
and zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to 
unsigned 32-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended forms of the instruction: 

PUNPCKHWD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source operand is also the destination register. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPUNPCKHWD 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 
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YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKHWD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKHWD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKHWD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKHWD xmml, xmm2lmem128 66 OF 69 It Unpacks and interleaves the high-order words of 

xmml and xmm2 or mem128. Writes the words to 
xmml. 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPUNPCKHWD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcf.0.01 

69/r 

VPUNPCKHWD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

69 It 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKLBW, (V)PUNPCKLDQ, 
(V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

s 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

s 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

s 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKLBW 

VPUNPCKLBW 


Unpack and Interleave 
Low Bytes 


Unpacks the 8 low-order bytes of each octword of the first and second source operands and inter¬ 
leaves the bytes as they are copied to the destination. The high-order bytes of each octword are 
ignored. 

Bytes are interleaved in ascending order from the least-significant byte of source operands with bytes 
from the first source operand occupying the lower byte of each pair copied to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[7:0] = srcl [7:0] 
dest[15:8] = src2[7:0] 
dest[23:16] = srcl [15:8] 
dest[31:24] = src2[15:8] 
dest[39:32] = srcl [23:16] 
dest[47:40] = src2[23:16] 
dest[55:48] = srcl [31:24] 
dest[63:56] = src2[31:24] 
dest[71:64] = srcl [39:32] 
dest[79:72] = src2[39:32] 
dest[87:80] = srcl [47:40] 
dest[95:88] = src2[47:40] 
dest[103:96] = srcl [55:48] 
dest[111:104] = src2[55:48] 
dest[119:112] = src1[63:56] 
dest[127:120] = src2[63:56] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 


dest[135:128] 
dest[143:136] 
dest[151:144] 
dest[159:152] 
dest[167:160] 
dest[175:168] 
dest[183:176] 
dest[191:184] 
dest[199:192] 
dest[207:200] 
dest[215:208] 
dest[223:216] 
dest[231:224] 
dest[239:232] 
dest[247:240] 
dest[255:248] 


srcl [135:128] 
src2[135:128] 
srcl [143:136] 
src2[143:136] 
srcl [151:144] 
src2[151:144] 
srcl [159:152] 
src2[159:152] 
srcl [167:160] 
src2[167:160] 
srcl [175:168] 
src2[175:168] 
srcl [183:176] 
src2[183:176] 
srcl [191:184] 
src2[191:184] 


When the second source operand is all 0s, the destination effectively receives the eight low-order 
bytes from the first source operand or the eight low-order bytes from both octwords of the first source 
operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to 
unsigned 16-bit operands for subsequent processing that requires higher precision. 
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There are legacy and extended fonns of the instruction: 

PUNPCKLBW 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source operand is also the destination register. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPUNPCKLBW 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKLBW 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKLBW 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKLBW 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKLBW xmml, xmm2lmem128 66 OF 60 /r Unpacks and interleaves the low-order bytes of 

xmml and xmm2 or mem128. Writes the bytes to 
xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKLBW xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl.0.01 

60 /r 

VPUNPCKLBW ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl.1.01 

60 /r 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUN- 
PCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKLDQ Unpack and Interleave 

VPUNPCKLDQ Low Doublewords 

Unpacks the two low-order doublewords of each octword of the first and second source operands and 
interleaves the doublewords as they are copied to the destination. The high-order doublewords of 
each octword of the source operands are ignored. 

Doublewords are interleaved in ascending order from the least-significant doubleword of the sources 
with doublewords from the first source operand occupying the lower doubleword of each pair copied 
to the destination. 

For the 128-bit form of the instruction, the following operations are perfonned: 

dest[31:0] = src1[31:0] 
dest[63:32] = src2[31:0] 
dest[95:64] = srcl [63:32] 
dest[127:96] = src2[63:32] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[159:128] = srcl [159:128] 
dest[191:160] = src2[159:128] 
dest[223:192] = srcl [191:160] 
dest[255:224] = src2[191:160] 

When the second source operand is all Os, the destination effectively receives the two low-order dou¬ 
blewords from the first source operand or the two low-order doublewords from both octwords of the 
source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit val¬ 
ues to unsigned 64-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended forms of the instruction: 

PUNPCKLDQ 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPUNPCKLDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 
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Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKLDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKLDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKLDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKLDQ xmml , xmm2/mem128 66 OF 62 /r Unpacks and interleaves the low-order doublewords 

of xmml and xmm2 or mem128. Writes the 
doublewords to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKLDQ xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

62 /r 

VPUNPCKLDQ ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

62 /r 


Related Instructions 

(V)PUNPCKHW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, 
(V)PUNPCKLQDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKLQDQ Unpack and Interleave 

VPUNPCKLQDQ Low Quadwords 

Unpacks the low-order quadword of each octword of the first and second source operands and inter¬ 
leaves the quadwords as they are copied to the destination. The high-order quadword of each octword 
of the source operands is ignored. 

Quadwords are interleaved in ascending order from the least-significant quadword of the sources with 
quadwords from the first source operand occupying the lower quadword of each pair copied to the 
destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[63:0] = src1[63:0] 
dest[127:64] = src2[63:0] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 

dest[191:128] = srcl [191:128] 
dest[255:192] = src2[191:128] 

When the second source operand is all Os, the destination effectively receives the low-order quadword 
from the first source operand or the low-order quadword of both octwords of the first source operand 
zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 
128-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended fonns of the instruction: 

PUNPCKLQDQ 

The first source operand is an XMM register and the second source operand is an XMM register or 
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of 
the YMM register that corresponds to the destination are not affected. 

VPUNPCKLQDQ 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKLQDQ 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKLQDQ 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKLQDQ 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKLQDQ xmml, xmm2lmem128 66 OF 6C /r Unpacks and interleaves the low-order 

quadwords of xmml and xmm2 or mem128. 
Writes the bytes to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKLQDQ xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

6C It 

VPUNPCKLQDQ ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

6C It 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUN- 
PCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PUNPCKLWD Unpack and Interleave 

VPUNPCKLWD Low Words 


Unpacks the four low-order words of each octword of the first and second source operands and inter¬ 
leaves the words as they are copied to the destination. The high-order words of each octword of the 
source operands are ignored. 

Words are interleaved in ascending order from the least-significant word of the source operands with 
words from the first source operand occupying the lower word of each pair copied to the destination. 

For the 128-bit form of the instruction, the following operations are performed: 

dest[15:0] = srcl [15:0] 
dest[31:16] = src2[15:0] 
dest[47:32] = srcl [31:16] 
dest[63:48] = src2[31:16] 
dest[79:64] = srcl [47:32] 
dest[95:80] = src2[47:32] 
dest[111:96] = srcl [63:48] 
dest[127:112] = src2[63:48] 

Additionally, for the 256-bit form of the instruction, the following operations are perfonned: 


dest[143:128] 
dest[159:144] 
dest[175:160] 
dest[191:176] 
dest[207:192] 
dest[223:208] 
dest[239:224] 
dest[255:240] 


srcl [143:128] 
src2[143:128] 
srcl [159:144] 
src2[159:144] 
srcl [175:160] 
src2[175:160] 
srcl [191:176] 
src2[191:176] 


When the second source operand is all Os, the destination effectively receives the 4 low-order words 
from the first source operand or the 4 low-order words of each octword of the first source operand 
zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 
32-bit operands for subsequent processing that requires higher precision. 


There are legacy and extended forms of the instruction: 

PUNPCKLWD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source operand is also the destination register. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

PUNPCKLWD 

The extended form of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 
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YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PUNPCKLWD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPUNPCKLWD 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPUNPCKLWD 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PUNPCKLWD xmml, xmm2lmem128 66 OF 61 /r Unpacks and interleaves the low-order words of 

xmml and xmm2 or mem 128. Writes the words to 
xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPUNPCKLWD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srclO.OI 

61 /r 

VPUNPCKLWD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcf.1.01 

61 /r 


Related Instructions 

(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUN- 
PCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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PXOR Packed Exclusive OR 

VPXOR 

Performs a bitwise XOR of the first and second source operands and writes the result to the destina¬ 
tion. When either of a pair of corresponding bits in the first and second operands are set, the corre¬ 
sponding bit of the destination is set; when both source bits are set or when both source bits are not 
set, the destination bit is cleared. 


There are legacy and extended forms of the instruction: 

PXOR 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 128-bit memory location. The first source XMM register is also the destination. Bits 
[255:128] of the YMM register that corresponds to the destination are not affected. 

VPXOR 

The extended fonn of the instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

PXOR 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VPXOR 128-bit 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VPXOR 256-bit 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

PXOR xmml, xmm2lmem128 66 OF EF /r Performs bitwise XOR of values in xmml and xmm2 or 

mem128. Writes the result to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPXOR xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

EF/r 

VPXOR ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

EF/r 


Related Instructions 

(V)PAND, (V)PANDN, (V)POR 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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RCPPS Reciprocal 

VRCPPS Packed Single-Precision Floating-Point 

Computes the approximate reciprocal of each packed single-precision floating-point value in the 
source operand and writes the results to the corresponding doubleword of the destination. 
MXCSR.RC as no effect on the result. 

The maximum error is less than or equal to 1.5 * 2 times the true reciprocal. A source value that is 
±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to 
signed zero. For both SNaN and QNaN source operands, a QNaN is returned. 


There are legacy and extended fonns of the instruction: 

RCPPS 

Computes four reciprocals. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. The first source register is also the destina¬ 
tion. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VRCPPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Computes four reciprocals. The source operand is either an XMM register or a 128-bit memory loca¬ 
tion. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Computes eight reciprocals. The source operand is either a YMM register or a 256-bit memory loca¬ 
tion. The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

RCPPS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VRCPPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

RCPPS xmml, xmm2lmem128 OF 53 /r Computes reciprocals of packed single-precision floating¬ 

point values in xmml or mem128. Writes result to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VRCPPS xmml, xmm2/mem128 

C4 

RXB.01 

X.1111.0.00 

53 It 

VRCPPS ymml, ymm2/mem256 

C4 

RXB.01 

X.1111.1.00 

53 It 
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Related Instructions 

(V)RCPSS, (V)RSQRTPS, (V)RSQRTSS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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RCPSS Reciprocal 

VRCPSS Scalar Single-Precision Floating-Point 

Computes the approximate reciprocal of the scalar single-precision floating-point value in a source 
operand and writes the results to the low-order doubleword of the destination. MXCSR.RC as no 
effect on the result. 

The maximum error is less than or equal to 1.5 * 2 times the true reciprocal. A source value that is 
±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to 
signed zero. For both SNaN and QNaN source operands, a QNaN is returned. 


There are legacy and extended forms of the instruction: 

RCPSS 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register 
that corresponds to the destination are not affected. 

VRCPSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand and the destination are XMM registers. The second source operand is either 
an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal; 
bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

RCPSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VRCPSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen- 


dix E of Volume 3. 



Instruction Encoding 

Mnemonic 

Opcode 

Description 

RCPSS xmml, xmm2/mem32 

F3 OF 53 It 

Computes reciprocal of scalar single-precision floating-point 
value in xmml or mem32. Writes the result to xmml. 

Mnemonic 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VRCPSS xmml, xmm2, xmm3lmem128 

C4 RXB.01 X.srclX.10 53/r 


Related Instructions 

(V)RCPPS, (V)RSQRTPS, (V)RSQRTSS 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

S 

CRO.EM = 1. 

Invalid opcode, #UD 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 




A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ROUNDPD Round 

VROUNDPD Packed Double-Precision Floating-Point 

Rounds two or four double-precision floating-point values as specified by an immediate byte oper¬ 
and. Source values are rounded to integral values and written to the destination as double-precision 
floating-point values. 

SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before 
rounding. 

The immediate byte operand is defined as follows. 


7 4 

3 

2 

1 0 

Reserved 

P 

0 

RC 


Bits 

Mnemonic 

Description 

[7:4] 

— 

Reserved 

[3] 

P 

Precision Exception 

[2] 

0 

Rounding Control Source 

[1:0] 

RC 

Rounding Control 


Precision exception definitions: 


Value 

Description 

0 

Normal PE exception 

1 

PE field is not updated. 

No precision exception is taken when unmasked. 


Rounding control source definitions: 


Value 

Description 

0 

Use RC from immediate operand 

1 

Use RC from MXCSR 


Rounding control definition: 


Value 

Description 

00 

Nearest 

01 

Downward (toward negative infinity) 

10 

Upward (toward positive infinity) 

11 

Truncated 


There are legacy and extended forms of the instruction: 

ROUNDPD 

Rounds two source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. 
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds 
to the destination are not affected. 
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VROUNDPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Rounds two source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. 
The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Rounds four source values. The first source operand is a YMM register and the second source oper¬ 
and is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand. 
The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

PCMPEQQ 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VPCMPEQQ 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 
Mnemonic 

ROUNDPD xmml, xmm2lmem128, 
imm8 

Mnemonic 

VROUNDPD xmml, xmm2lmem128, imm8 
VROUNDPD ymml, xmm2lmem256, imm8 

Related Instructions 

(V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS 

rFLAGS Affected 

None 


Description 

Rounds double-precision floating-point values 
in xmm2 or mem128. Writes rounded double¬ 
precision values to xmml. 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.03 

X.1111.0.01 

09 It ib 

C4 

RXB.03 

X.1111.1.01 

09 /r ib 


Opcode 

66 OF 3A09/rib 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

s 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ROUNDPS Round 

VROUNDPS Packed Single-Precision Floating-Point 

Rounds four or eight single-precision floating-point values as specified by an immediate byte oper¬ 
and. Source values are rounded to integral values and written to the destination as single-precision 
floating-point values. 

SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before 
rounding. 

The immediate byte operand is defined as follows. 


7 4 

3 

2 

1 0 

Reserved 

P 

0 

RC 


Bits 

Mnemonic 

Description 

[7:4] 

— 

Reserved 

[3] 

P 

Precision Exception 

[2] 

0 

Rounding Control Source 

[1:0] 

RC 

Rounding Control 


Precision exception definitions: 


Value 

Description 

0 

Normal PE exception 

1 

PE field is not updated. 

No precision exception is taken when unmasked. 


Rounding control source definitions: 


Value 

Description 

0 

Use RC from immediate operand 

1 

Use RC from MXCSR 


Rounding control definition: 


Value 

Description 

00 

Nearest 

01 

Downward (toward negative infinity) 

10 

Upward (toward positive infinity) 

11 

Truncated 


There are legacy and extended forms of the instruction: 

ROUNDPS 

Rounds four source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. 
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds 
to the destination are not affected. 
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VROUNDPS 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Rounds four source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. 
The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Rounds eight source values. The first source operand is a YMM register and the second source oper¬ 
and is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand. 
The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

ROUNDPS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VROUNDPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ROUNDPS xmml, xmm2lmem128, imm8 66 OF 3A 08 /r ib Rounds single-precision floating-point 

values in xmm2 or mem128. Writes 
rounded single-precision values to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VROUNDPS xmml, xmm2/mem128, imm8 

C4 

RXB.03 

X.1111.0.01 

08 /r ib 

VROUNDPS ymml, xmm2/mem256, imm8 

C4 

RXB.03 

X.1111.1.01 

08 /rib 


Related Instructions 

(V)ROUNDPD, (V)ROUNDSD, (V)ROUNDSS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

s 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ROUNDSD Round 

VROUNDSD Scalar Double-Precision 

Rounds a scalar double-precision floating-point value as specified by an immediate byte operand. 
Source values are rounded to integral values and written to the destination as double-precision float¬ 
ing-point values. 

SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before 
rounding. 

The immediate byte operand is defined as follows. 


7 4 

3 

2 

1 0 

Reserved 

P 

0 

RC 


Bits 

Mnemonic 

Description 

[7:4] 

— 

Reserved 

[3] 

P 

Precision Exception 

[2] 

0 

Rounding Control Source 

[1:0] 

RC 

Rounding Control 


Precision exception definitions: 


Value 

Description 

0 

Normal PE exception 

1 

PE field is not updated. 

No precision exception is taken when unmasked. 


Rounding control source definitions: 


Value 

Description 

0 

Use RC from immediate operand 

1 

Use RC from MXCSR 


Rounding control definition: 


Value 

Description 

00 

Nearest 

01 

Downward (toward negative infinity) 

10 

Upward (toward positive infinity) 

11 

Truncated 


There are legacy and extended forms of the instruction: 

ROUNDSD 

The source operand is either an XMM register or a 64-bit memory location. When the source is an 
XMM register, the value to be rounded must be in the low quadword. The destination is an XMM reg¬ 
ister. There is a third 8-bit immediate operand. Bits [127:64] of the destination are not affected. Bits 
[255:128] of the YMM register that corresponds to destination XMM register are not affected. 
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VROUNDSD 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate 
operand. Bits [127:64] of the destination are copied from the first source operand. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

instruction Support 


Form 

Subset 

Feature Flag 

ROUNDSD 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VROUNDSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

ROUNDSD xmml, xmm2lmem64, imm8 66 OF 3A OB /r ib Rounds a double-precision floating-point 

value in xmm2[63:0] or mem64. Writes a 
rounded double-precision value to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VROUNDSD xmml, xmm2, xmm3/mem64, imm8 C4 RXB.03 X.srclX.01 OB/r ib 

Related Instructions 

(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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ROUNDSS Round 

VROUNDSS Scalar Single-Precision 

Rounds a scalar single-precision floating-point value as specified by an immediate byte operand. 
Source values are rounded to integral values and written to the destination as single-precision float¬ 
ing-point values. 

SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before 
rounding. 

The immediate byte operand is defined as follows. 


7 4 

3 

2 

1 0 

Reserved 

P 

0 

RC 


Bits 

Mnemonic 

Description 

[7:4] 

— 

Reserved 

[3] 

P 

Precision Exception 

[2] 

0 

Rounding Control Source 

[1:0] 

RC 

Rounding Control 


Precision exception definitions: 


Value 

Description 

0 

Normal PE exception 

1 

PE field is not updated. 

No precision exception is taken when unmasked. 


Rounding control source definitions: 


Value 

Description 

0 

Use RC from immediate operand 

1 

Use RC from MXCSR 


Rounding control definition: 


Value 

Description 

00 

Nearest 

01 

Downward (toward negative infinity) 

10 

Upward (toward positive infinity) 

11 

Truncated 


There are legacy and extended forms of the instruction: 

ROUNDSS 

The source operand is either an XMM register or a 32-bit memory location. When the source is an 
XMM register, the value to be rounded must be in the low doubleword. The destination is an XMM 
register. There is a third 8-bit immediate operand. Bits [127:32] of the destination are not affected. 
Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected. 
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VROUNDSS 

The extended form of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate 
operand. Bits [127:32] of the destination are copied from the first source operand. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

instruction Support 


Form 

Subset 

Feature Flag 

ROUNDSS 

SSE4.1 

CPUID Fn0000_0001_ECX[SSE41] (bit 19) 

VROUNDSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode 

ROUNDSS xmml, xmm2/mem64, imm8 66 OF 3A OA /r 


Mnemonic 

VROUNDSS xmml, xmm2, xmm3lmem64, imm8 


VEX 

C4 


Description 

ib Rounds a single-precision floating-point 
value in xmm2[63:0] or mem64. Writes a 
rounded single-precision value to xmml. 

Encoding 

RXB.map_select W.vvvv.L.pp Opcode 

RXB.03 X.srcf.X.01 OA/r ib 


Related Instructions 

(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

s 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

s 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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RSQRTPS Reciprocal Square Root 

VRSQRTPS Packed Single-Precision Floating-Point 

Computes the approximate reciprocal of the square root of each packed single-precision floating¬ 
point value in the source operand and writes the results to the corresponding doublewords of the des¬ 
tination. MXCSR.RC has no effect on the result. 

The maximum error is less than or equal to 1.5 * 2 times the true reciprocal square root. A source 
value that is ±zero or denonnal returns an infinity of the source value sign. Negative source values 
other than -zero and -denormal return a QNaN floating-point indefinite value. For both SNaN and 
QNaN source operands, a QNaN is returned. 


There are legacy and extended fonns of the instruction: 

RSQRTPS 

Computes four values. The first source operand is an XMM register. The second source operand is 
either an XMM register or a 128-bit memory location. The first source register is also the destination. 
Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VRSQRTPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Computes four values. The destination is an XMM register. The source operand is either an XMM 
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Computes eight values. The destination is a YMM register. The source operand is either a YMM reg¬ 
ister or a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

RSQRTPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VRSQRTPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

RSQRTPS xmml, xmm2lmem128 OF 52 /r Computes reciprocals of square roots of packed single¬ 

precision floating-point values in xmml or mem128. 
Writes result to xmml 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VRSQRTPS xmml, xmm2lmem128 

C4 

RXB.01 

X.1111.0.00 

52 /r 

VRSQRTPS ymml, ymm2lmem256 

C4 

RXB.01 

X.1111.1.00 

52 /r 
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Related Instructions 

(V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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RSQRTSS Reciprocal Square Root 

VRSQRTSS Scalar Single-Precision Floating-Point 

Computes the approximate reciprocal of the square root of the scalar single-precision floating-point 
value in a source operand and writes the result to the low-order doubleword of the destination. 
MXCSR.RC as no effect on the result. 

The maximum error is less than or equal to 1.5 * 2 times the true reciprocal square root. A source 
value that is ±zero or denonnal returns an infinity of the source value’s sign. Negative source values 
other than -zero and -denormal return a QNaN floating-point indefinite value. For both SNaN and 
QNaN source operands, a QNaN is returned. 


There are legacy and extended fonns of the instruction: 

RSQRTSS 

The source operand is either an XMM register or a 32-bit memory location. The destination is an 
XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register 
that corresponds to the destination are not affected. 

VRSQRTSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand and the destination are XMM registers. The second source operand is either 
an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal 
square root of the single-precision floating-point value held in bits [31:0] of the second source oper¬ 
and; bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the 
YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

RSQRTSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VRSQRTSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

RSQRTSS xmml, xmm2/mem32 F3 OF 52 /r Computes reciprocal of square root of a scalar single¬ 
precision floating-point value in xmml or mem32. Writes 
result to xmml 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VRSQRTSS xmml, xmm2, xmm3/mem128 C4 RXB.01 X.srclX.10 52/r 

Related Instructions 

(V)RSQRTPS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 

Invalid opcode, #UD 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 




A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SHA1RNDS4 Four Rounds of SHA1 

Execute 4 rounds of a SHA1 operation using the 4 double words (A, B, C, D) from the first source 
operand, and value E from the second operand. The lower two bits of the immediate are used to spec¬ 
ify the function and constant appropriate for the current round of processing. The resulting (A, B, C, 
D) is placed in the destination register which is the same as the first source register. 

The following function is performed: 


A <- SRC1 [127:96]; 

B <r SRC1 [95:64]; 

C <r SRC1 [63:32]; 

D <- SRC1 [31:0]; 

WoE <- SRC2[127:96]; 

Wi ^ SRC2[95:64]; 

W2 <r SRC2[63:32]; 

W3 <F SRC2[31:0]; 

i=imm[1:0] which determines f_i and K_i 

First Round operation: 

A_1 <- f_ 0(B, C, D) + (A Rotate Left 5) +WoE +K_0; 

B_1 <- A; 

C_1 <- B Rotate Left 30; 

D_1 <- C; 

E_1 <- D; 

FOR j = 1 to 3 

{ A_(j +1) <- fJ(BJ, CJ, DJ) + (AJ Rotate Left 5) +Wj+ EJ +K_i; 

B_(j + 1) <- AJ; 

C_(j +1) <- BJ Rotate Left 30; 

DJj+1)<- CJ; 

E_(j+1)<- DJ; 

} 

DEST[127:96] <- A_4; 

DEST[95:64] <- B_4; 

DEST[63:32] <- C_4; 

DEST[31:0] <- D_4; 


Mnemonic Opcode Description 

SHA1RNDS4 xmml, xmm2/m128 Jmm8 0F3ACC/rib Executes 4 Rounds of SHA1 

Related Instructions 

SHA1NEXTE, SHA1MSG1, SHA1MSG2 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOO_OOC)1_ECX[OSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 



X 

Null data segment used to reference memory 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 



A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


S 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 
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SHA1NEXTE Calculate Next E SHA1 

Calculate what the next E register values should be after 4 rounds of a SHA1 operation using the 4 
double words from the second source operand, and value A from the first operand. The resulting E is 
placed in the destination register which is the same as the first source register. 

DEST[127:96] <- SRC2[127:96] + (SRC1 [127:96] rotated left 30) 

DEST[95:0] <- SRC2[95:0]; 

Mnemonic Opcode Description 

SHA1 NEXTE xmm1,xmm2/m128 OF 38 C8 /r Calculate Next E of SHA1 

Related Instructions 

SHA1RNDS4, SHA1MSG1, SHA1MSG2 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 

Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOO_OOC)1_ECX[OSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 



X 

Null data segment used to reference memory 
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Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 




A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


S 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 
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SHA1MSG1 Message Intermediate 1 

Performs the 1st of two intermediate calculations necessary before doing the next four rounds of the 
SHA1 message. 


DEST[127:96] <- 
DEST[95:64] <- 
DEST[63:32] <- 
DEST[31:0] <- 


SRC1 [63:32] XOR 
SRC1 [31:0] XOR 
SRC2[127:96] XOR 
SRC2[95:64] XOR 


SRC1[127:96] 
SRC1 [95:64] 
SRC1 [63:32] 
SRC1 [31:0] 


Mnemonic Opcode Description 

SHA1MSG1 xmml, xmm2/m128 OF 38 C9 /r Calculate Message Intermediate 1 

Related Instructions 

SHA1RNDS4, SHA1NEXTE, SHA1MSG2 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 

Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOOOOOIJECXfOSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 



X 

Null data segment used to reference memory 
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Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 




A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


S 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 


Instruction Reference 
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SHA1MSG2 Message Calculation 2 

Performs the 2nd of two intermediate calculations necessary before doing the next four rounds of the 
SHA1 message. 


Temp[31:0] 

4- 

(SRC1[127:96] 

XOR 

SRC2[95:64]) Rotate Left 1 

DEST[127:96] <- 

Temp[31:0] 




DEST[95:64] 

<- 

(SRC1 [95:64] 

XOR 

SRC2[63:32]) Rotate Left 1 

DEST[63:32] 

*■ 

(SRC1{63:32] 

XOR 

SRC2[31:0]) 

Rotate Left 1 

DEST[31:0] 


(SRC1[31:0] 

XOR 

Temp[31:0]) 

Rotate Left 1 


Mnemonic Opcode Description 

SHA1MSG2 xmml, xmm2/m128 0F38CA/r CCalculate Message Intermediate 2 

Related Instructions 

SHA1RNDS4, SHA1NEXTE, SHA1MSG1 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOO_OOC)1_ECX[OSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 
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Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 

General protection, #GP 

S 

s 

X 

Memory address exceeding data segment limit or 
non-canonical. 



X 

Null data segment used to reference memory 

Alignment check, #AC 

s 

s 

s 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 



A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


s 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 


Instruction Reference 
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SHA256RNDS2 Two Rounds of SHA256 

Performs 2 rounds of SHA256 operation with the first operand holding the initial SHA256 state (C, 
D, G, H), the second operand holding the initial SHA256 state (A, B, E, F), and the implicit operand 
xrninO holding a pre-computed sum of the next two double word round 2 message as well as the cor¬ 
responding round constants. The resulting SHA256 state (A, B, E, F) is placed in the destination reg¬ 
ister. 


A_0 <- SRC2[127:96]; 

B_0 <- SRC2[95:64]; 

C_0*- SRC1 [127:96]; 

D_0 <- SRC1 [95:64]; 

E_0 <- SRC2[63:32]; 

F_0 <- SRC2[31:0]; 

G O <- SRC1 [63:32]; 

H O <- SRC1 [31:0]; 

Ko <- XMM0[31: 0]; 

Ki <- XMM0[63: 32]; 

FOR i = 0 to 1 

{ A_(i +1) <- Ch (E_i, F_i, G_i) + Perml(E_i) +Kj + H_i + Ma(A_i, B_i, C_i) + PermO(A_i); 

B_(i +1) <- A_i; 

C_(i +1 )<r B_i; 

D_(i +1) <- C_i; 

E_(i +1) i- Ch (E_i, F_i, G_i) + Perml (E_i) + Kj + H_i + D_i; 

F_(i +1) <- E_i; 

G_(i+1)<- F_i; 

H_(i +1) <- G_i; 

} 

DEST[127:96] <- A_2; 

DEST[95:64] B_2; 

DEST[63:32] <- E_2; 

DEST[31:0] <- F_2; 


Description 

Execute 2 rounds of SHA256 

Related Instructions 

SHA256MSG1, SHA256MSG2 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Mnemonic Opcode 

SHA256RNDS2xmm1, xmm2/m128, xmmO OF 38 CB /r 
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Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOO_OOC)1_ECX[OSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 



X 

Null data segment used to reference memory 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 



A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


S 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 


Instruction Reference 
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SHA256MSG1 Message Intermediate 1 

Performs the 1st of two intermediate calculations necessary for the next four SHA256 message 
dwords. 


DEST[127:96] <- 
DEST[95:64] <- 
DEST[63:32] <- 
DEST[31:0] <- 


SRC 1 [127:96] 
SRC1 [95:64] 
SRC1 [63:32] 
SRC1 [31:0] 


+ Perm2( SRC2[31:0]) 

+ Perm2( SRC1 [127:96]) 
+ Perm2( SRC1 [95:64] 

+ Perm2( SRC1 [63:62]) 


Mnemonic Opcode Description 

SHA256MSG1xmm1, xmm2/m128 OF 38 CC/r Calculate Message Intermediate 1 

Related Instructions 

SHA256RNDS2, SHA256MSG2 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
FnOOOOOOOIJECXfOSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 
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Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 




X 

Null data segment used to reference memory 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 




A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


s 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 
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SHA256MSG2 Message Intermediate 2 

Performs the 2nd of two intermediate calculations necessary for the next four SHA256 message 
dwords. 


TempO <- SRC1 [31:0] + Perm3( SRC2[95:64]) 

Tempi <- SRC1 [63:32] + Perm3( SRC2[127:96]) 


DEST[127:96] <- 
DEST[95:64] <- 
DEST[63:32] <- 
DEST[31:0] <- 


SRC 1 [127:96] 
SRC1 [95:64] 
SRC1 [63:32] 
SRC1 [31:0] 


+ Perm3( Tempi) 

+ Perm3( TempO) 

+ Perm3( SRC2[127:96]) 
+ Perm3( SRC2[95:624]) 


Mnemonic Opcode Description 

SHA256MSG1 xmml, xmm2/m128 OF 38 CD /r Calculate Message Intermediate 2 

Related Instructions 

SHA256RNDS2, SHA256MSG1 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported by CPUID 

A 

A 


AVX instructions are only recognized in protected 
mode 

S 

S 

S 

CR0.EM=1 OR CR4.OSFXSR=0 



A 

CR4.0SXSAVE = 0, indicated by CPUID 
Fn0000_0001_ECX[OSXSAVE] 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or 
non-canonical. 
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Exceptions 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or 
non-canonical. 




X 

Null data segment used to reference memory 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when 
alignment checking enabled and MXCSR.MM = 1. 




A 

Alignment checking enabled and 256-bit memory 
operand not 32-byte aligned or 128-bit memory 
operand not 16-byte aligned. 

Page Fault, #PF 


s 

X 

A page fault resulted from the execution of the 
instruction 

X - SSE, AVX, and AVX2 exception 

A - AVX, AVX2 exception 

S - SSE exception 


Instruction Reference 
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SHUFPD Shuffle 

VSHUFPD Packed Double-Precision Floating-Point 

Copies packed double-precision floating-point values from either of two sources to quadwords in the 
destination, as specified by bit fields of an immediate byte operand. 

Each bit corresponds to a quadword destination. The 128-bit legacy and extended versions of the 
instruction use bits [1:0]; the 256-bit extended version uses bits [3:0], as shown. 


Destination 

Quadword 

Immediate-Byte 
Bit Field 

Value of 

Bit Field 

Source 1 

Bits Copied 

Source 2 

Bits Copied 

Used by 128 

-bit encoding and 

1 256-bit encoding 

[63:0] 

[0] 

0 

[63:0] 

— 

1 

[127:64] 

— 

[127:64] 

[1] 

0 

— 

[63:0] 

1 

— 

] 127:64] 

Used only by 256-bit encoding 

[191:128] 

[2] 

0 

[191:128] 

— 

1 

[255:192] 

— 

[255:192] 

[3] 

0 

— 

[191:128] 

1 

— 

[255:192] 


There are legacy and extended forms of the instruction: 

SHUFPD 

Shuffles four source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. 
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds 
to the destination are not affected. 

VSHUFPD 

The extended form of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Shuffles four source values. The first source operand is an XMM register. The second source operand 
is either an XMM register or a 128-bit memory location. The destination is a third XMM register. 
There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Shuffles eight source values. The first source operand is a YMM register and the second source oper¬ 
and is either a YMM register or a 256-bit memory location. The destination is a third YMM register. 
There is a fourth 8-bit immediate operand. 
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Instruction Support 


Form 

Subset 

Feature Flag 

SHUFPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VSHUFPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 

SHUFPD xmml, xmm2lmem128, imm8 

Mnemonic 

VSHUFPD xmml, xmm2, xmm3/mem128, 
VSHUFPD ymml, ymm2, ymm3/mem256, 


Opcode Description 

66 OF C6 /r ib Shuffles packed double-precision floating¬ 

point values in xmml and xmm2 or 
mem128. Writes the result to xmml. 


Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

imm8 

C4 

RXB.01 

X.srcl. 0.01 

C6 It 

imm8 

C4 

RXB.01 

X.srcf.1.01 

C6 /r 


Related Instructions 

(V)SHUFPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Instruction Reference 


SHUFPD, VSHUFPD 


559 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SHUFPS Shuffle 

VSHUFPS Packed Single-Precision Floating-Point 

Copies packed single-precision floating-point values from either of two sources to doublewords in the 
destination, as specified by bit fields of an immediate byte operand. 

Each bit field corresponds to a doubleword destination. The 128-bit legacy and extended versions of 
the instruction use a single 128-bit destination; the 256-bit extended version performs duplicate oper¬ 
ations on bits [127:0] and bits [255:128] of the source and destination. 


Destination 

Doubleword 

Immediate-Byte 

Bit Field 

Value of Bit 
Field 

Source 1 

Bits Copied 

Source 2 

Bits Copied 

[31:0] 

[1:0] 

00 

[31:0] 

— 

01 

[63:32] 

— 

10 

[95:64] 

— 

11 

[127:96] 

— 

[63:32] 

[3:2] 

00 

[31:0] 

— 

01 

[63:32] 

— 

10 

[95:64] 

— 

11 

[127:96] 

— 

[95:64] 

[5:4] 

00 

— 

[31:0] 

01 

— 

[63:32] 

10 

— 

[95:64] 

11 

— 

[127:96] 

[127:96] 

[7:6] 

00 

— 

[31:0] 

01 

— 

[63:32] 

10 

— 

[95:64] 

11 

— 

[127:96] 

Upper 128 bits of 256-bit source and destination used by 256-bit encoding 

[159:128] 

[1:0] 

00 

[159:128] 

— 

01 

[191:160] 

— 

10 

[223:192] 

— 

11 

[255:224] 

— 

[191:160] 

[3:2] 

00 

[159:128] 

— 

01 

[191:160] 

— 

10 

[223:192] 

— 

11 

[255:224] 

— 

[223:192] 

[5:4] 

00 

— 

[159:128] 

01 

— 

[191:160] 

10 

— 

[223:192] 

11 

— 

[255:224] 

[255:224] 

[7:6] 

00 

— 

[159:128] 

01 

— 

[191:160] 

10 

— 

[223:192] 

11 

— 

[255:224] 
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There are legacy and extended fonns of the instruction: 

SHUFPS 

Shuffles eight source values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate oper¬ 
and. The first source register is also the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are not affected. 

VSHUFPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Shuffles eight source values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. The destination is a third XMM register. 
There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Shuffles 16 source values. The first source operand is a YMM register and the second source operand 
is either a YMM register or a 256-bit memory location. The destination is a third YMM register. 
There is a fourth 8-bit immediate operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

SHUFPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VSHUFPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Opcode Description 

SHUFPS xmml, xmm2/mem128, imm8 OF C6 /r ib Shuffles packed single-precision floating¬ 

point values in xmml and xmm2 or 
mem128. Writes the result to xmml. 


Mnemonic 

VSHUFPS xmml, xmm2, xmm3/mem128, imm8 
VSHUFPS ymml, ymm2, ymm3/mem256, imm8 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcl.0.00 

C6 /r 

C4 

RXB.01 

X.srcl. 1.00 

C6 /r 


Related Instructions 

(V)SHUFPD 


rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SQRTPD Square Root 

VSQRTPD Packed Double-Precision Floating-Point 

Computes the square root of each packed double-precision floating-point value in a source operand 
and writes the result to the corresponding quadword of the destination. 

Performing the square root of +infinity returns +infinity. 

There are legacy and extended fonns of the instruction: 

SQRTPD 

Computes two values. The destination is an XMM register. The source operand is either an XMM 
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the 
destination are not affected. 

VSQRTPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Computes two values. The source operand is either an XMM register or a 128-bit memory location. 
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the des¬ 
tination are cleared. 

YMM Encoding 

Computes four values. The source operand is either a YMM register or a 256-bit memory location. 
The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

SQRTPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VSQRTPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

SQRTPD xmml, xmm2lmem128 

Mnemonic 


VSQRTPD xmml, xmm2/mem128 
VSQRTPD ymml, ymm2/mem256 

Related Instructions 

(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS 


Opcode Description 

66 OF 51 /r Computes square roots of packed double-precision 

floating-point values in xmml or mem128. Writes the 
results to xmml. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.01 X.1111.0.01 51/r 

C4 RXB.01 X.1111.1.01 51/r 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 




M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SQRTPS Square Root 

VSQRTPS Packed Single-Precision Floating-Point 

Computes the square root of each packed single-precision floating-point value in a source operand 
and writes the result to the corresponding doubleword of the destination. 

Performing the square root of +infinity returns +infinity. 

There are legacy and extended fonns of the instruction: 

SQRTPS 

Computes four values. The destination is an XMM register. The source operand is either an XMM 
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the 
destination are not affected. 

VSQRTPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Computes four values. The destination is an XMM register. The source operand is either an XMM 
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the 
destination are cleared. 

YMM Encoding 

Computes eight values. The destination is a YMM register. The source operand is either a YMM reg¬ 
ister or a 256-bit memory location. 


Instruction Support 



SQRTPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) 


VSQRTPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) _ 

For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

SQRTPS xmml, xmm2lmem128 OF 51 /r Computes square roots of packed single-precision 

floating-point values in xmml or mem128. Writes the 
results to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VSQRTPS xmml, xmm2/mem128 

C4 

RXB.01 

X.1111.0.00 

51 /r 

VSQRTPS ymml, ymm2/mem256 

C4 

RXB.01 

X.1111.1.00 

51 /r 


Related Instructions 

(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTSD, (V)SQRTSS 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 




M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SQRTSD Square Root 

VSQRTSD Scalar Double-Precision Floating-Point 

Computes the square root of a double-precision floating-point value and writes the result to the low 
quadword of the destination. The three-operand form of the instruction also writes a copy of the upper 
quadword of a second source operand to the upper quadword of the destination. 

Performing the square root of +infinity returns +infinity. 


There are legacy and extended forms of the instruction: 

SQRTSD 

The source operand is either an XMM register or a 64-bit memory location. When the source is an 
XMM register, the source value must be in the low quadword. The destination is an XMM register. 
Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds 
to destination XMM register are not affected. 

VSQRTSD 

The extended fonn of the instruction has a single 128-bit encoding that requires three operands: 
VSQRTSD xmml, xmm2, xmm3/mem64 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 64-bit memory location. When the second source is an XMM register, the source value must be in 
the low quadword. The destination is a third XMM register. The square root of the second source 
operand is written to bits [63:0] of the destination register. Bits [127:64] of the destination are copied 
from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

SQRTSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VSQRTSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen- 


dix E of Volume 3. 



Instruction Encoding 

Mnemonic 

Opcode 

Description 

SQRTSD xmml, xmm2lmem64 

F2 OF 51 It 

Computes the square root of a double-precision floating¬ 
point value in xmml or mem64. Writes the result to xmml 

Mnemonic 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VSQRTSD xmml, xmm2, xmm3/mem64 

C4 RXB.01 X.srct.X.11 51 /r 


Related Instructions 

(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSS 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 




M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SQRTSS Square Root 

VSQRTSS Scalar Single-Precision Floating-Point 

Computes the square root of a single-precision floating-point value and writes the result to the low 
doubleword of the destination. The three-operand form of the instruction also writes a copy of the 
three most significant doublewords of a second source operand to the upper 96 bits of the destination. 

Performing the square root of +infinity returns +infinity. 


There are legacy and extended forms of the instruction: 

SQRTSS 

The source operand is either an XMM register or a 32-bit memory location. When the source is an 
XMM register, the source value must be in the low doubleword. The destination is an XMM register. 
Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds 
to destination XMM register are not affected. 

VSQRTSS 

The extended fonn has a single 128-bit encoding that requires three operands: 

VSQRTSS xmml, xmm2, xmm3/mem64 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 32-bit memory location. When the second source is an XMM register, the source value must be in 
the low doubleword. The destination is a third XMM register. The square root of the second source 
operand is written to bits [31:0] of the destination register. Bits [127:32] of the destination are copied 
from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that cor¬ 
responds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

SQRTSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VSQRTSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen- 


dix E of Volume 3. 



Instruction Encoding 

Mnemonic 

Opcode 

Description 

SQRTSS xmml, xmm2/mem32 

F3 0F51 It 

Computes square root of a single-precision floating-point 
value in xmml or mem32. Writes the result to xmml. 

Mnemonic 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VSQRTSS xmml, xmm2, xmm3/mem64 

C4 RXB.01 X.srct.X.10 51/r 


Related Instructions 

(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 




M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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STMXCSR Store MXCSR 

VSTMXCSR 

Saves the content of the MXCSR extended control/status register to a 32-bit memory location. 
Reserved bits are stored as zeroes. The MXCSR is described in “Registers” in Volume 1. 

For both legacy STMXCSR and extended VSTMXCSR forms of the instruction, the source operand 
is the MXCSR and the destination is a 32-bit memory location. 

There is one encoding for each instruction form. 


Instruction Support 


Form 

Subset 

Feature Flag 

STMXCSR 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VSTMXCSR 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

STMXCSR mem32 OF AE /3 

Mnemonic 

VSTMXCSR mem32 

Related Instructions 

(V)LDMXCSR 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Description 

Stores content of MXCSR in mem32. 

Encoding 

VEX RXB.map select W.vvvv.L.pp 

C4 RXB.01 X.1111.0.00 


Opcode 

AE 13 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

S 

S 

S 

CRO.EM = 1. 

S 

S 

S 

CR4.0SFXSR = 0. 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 

S 

S 

S 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SUBPD Subtract 

VSUBPD Packed Double-Precision Floating-Point 

Subtracts each packed double-precision floating-point value of the second source operand from the 
corresponding value of the first source operand and writes the difference to the corresponding quad- 
word of the destination. 


There are legacy and extended forms of the instruction: 

SUBPD 

Subtracts two pairs of values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. The first source register is also the desti¬ 
nation. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VSUBPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Subtracts two pairs of values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. The destination is a third XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Subtracts four pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

SUBPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VSUBPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

SUBPD xmml, xmm2lmem128 66 OF 5C /r Subtracts packed double-precision floating-point values in 

xmm2 or mem128 from corresponding values of xmml. 
Writes differences to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VSUBPD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl.0.01 

5C /r 

VSUBPD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

5C /r 


Related Instructions 

(V)SUBPS, (V)SUBSD, (V)SUBSS 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


SUBPD, VSUBPD 
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SUBPS Subtract 

VSUBPS Packed Single-Precision Floating-Point 

Subtracts each packed single-precision floating-point value of the second source operand from the 
corresponding value of the first source operand and writes the difference to the corresponding quad- 
word of the destination. 


There are legacy and extended forms of the instruction: 

SUBPS 

Subtracts four pairs of values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. The first source register is also the desti¬ 
nation. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VSUBPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Subtracts four pairs of values. The first source operand is an XMM register. The second source oper¬ 
and is either an XMM register or a 128-bit memory location. The destination is a third XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Subtracts eight pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 


Instruction Support 


Form 

Subset 

Feature Flag 

SUBPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VSUBPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

SUBPS xmml, xmm2/mem128 OF 5C /r Subtracts packed single-precision floating-point values in 

xmm2 or mem128 from corresponding values of xmml. 
Writes differences to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VSUBPS xmml, xmm2, xmm3/mem128 

C4 

RXB.00001 

X.src.0.00 

5C /r 

VSUBPS ymml, ymm2, ymm3/mem256 

C4 

RXB.00001 

X.src.1.00 

5C /r 


Related Instructions 

(V)SUBPD, (V)SUBSD, (V)SUBSS 
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AMDS 

AMD64 Technology 


rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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SUBSD Subtract 

VSUBSD Scalar Double-Precision Floating-Point 

Subtracts the double-precision floating-point value in the low-order quadword of the second source 
operand from the corresponding value in the first source operand and writes the result to the low- 
order quadword of the destination 


There are legacy and extended forms of the instruction: 

SUBSD 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] 
of the destination and bits [255:128] of the corresponding YMM register are not affected. 

VSUBSD 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first 
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

SUBSD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VSUBSD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to 
dix E of Volume 3. 

obtain processor feature support infonnation, see Appen- 

Instruction Encoding 




Mnemonic 

Opcode 

Description 


SUBSD xmml, xmm2/mem64 

F2 OF 5C It 

Subtracts low-order double-precision floating-point value in 
xmm2 or mem64 from the corresponding value of xmml. 
Writes the difference to xmml. 

Mnemonic 


Encoding 




VEX RXB.mapselect W.vvvv.L.pp 

Opcode 

VSUBSD xmml, xmm2, xmm3lmem64 

C4 RXB.01 X.srcTX.U 

5C /r 


Related Instructions 

(V)SUBPD, (V)SUBPS, (V)SUBSS 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


SUBSD, VSUBSD 


579 








AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


SUBSS Subtract 

VSUBSS Scalar Single-Precision Floating-Point 

Subtracts the single-precision floating-point value in the low-order word of the second source oper¬ 
and from the corresponding value in the first source operand and writes the result to the low-order 
word of the destination 


There are legacy and extended forms of the instruction: 

SUBSS 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The first source register is also the destination register. Bits [127:32] 
of the destination and bits [255:128] of the corresponding YMM register are not affected. 

VSUBSS 

The extended fonn of the instruction has a 128-bit encoding only. 

The first source operand is an XMM register and the second source operand is either an XMM regis¬ 
ter or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first 
source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that 
corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

SUBSS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VSUBSS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

SUBSS xmml , xmm2lmem32 F3 OF 5C /r Subtracts a low-order single-precision floating-point value 

in xmm2 or mem32 from the corresponding value of xmml. 
Writes the difference to xmml. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VSUBSS xmml, xmm2, xmm3/mem32 C4 RXB.01 X.srclX.10 5C/r 

Related Instructions 

(V)SUBPD, (V)SUBPS, (V)SUBSD 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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UCOMISD Unordered Compare 

VUCOMISD Scalar Double-Precision Floating-Point 

Performs an unordered comparison of a double-precision floating-point value in the low-order 64 bits 
of an XMM register with a double-precision floating-point value in the low-order 64 bits of an XMM 
register or a 64-bit memory location. 

The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows. 


Result of Compare 

ZF 

PF 

CF 

Unordered 

1 

1 

1 

Greater Than 

0 

0 

0 

Less Than 

0 

0 

1 

Equal 

1 

0 

0 


The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD float¬ 
ing-point exception (#XF), the rFLAGS bits are not updated. 

The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD 
floating-point invalid operation exception (#1) only when a source operand is an SNaN. 

The legacy and extended forms of the instruction operate in the same way. 


Instruction Support 


Form 

Subset 

Feature Flag 

UCOMISD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VUCOMISD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen- 


dix E of Volume 3. 



Instruction Encoding 

Mnemonic 

Opcode 

Description 

UCOMISD xmml, xmm2/mem64 

Mnemonic 

VUCOMISD xmml, xmm2/mem64 

66 OF 2E It 

Compares scalar double-precision floating-point values 
in xmml and xmm2 or mem64. Sets rFLAGS. 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.00001 X.1111.X.01 2E/r 


Related Instructions 

(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISS 
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AMDS 

AMD64 Technology 


rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




0 

M 

0 

M 

M 

21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: 

Note: 

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. 

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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UCOMISS Unordered Compare 

VUCOMISS Scalar Single-Precision Floating-Point 

Performs an unordered comparison of a single-precision floating-point value in the low-order 32 bits 
of an XMM register with a single-precision floating-point value in the low-order 32 bits of an XMM 
register or a 32-bit memory location. 

The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows. 


Result of Compare 

ZF 

PF 

CF 

Unordered 

1 

1 

1 

Greater Than 

0 

0 

0 

Less Than 

0 

0 

1 

Equal 

1 

0 

0 


The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD float¬ 
ing-point exception (#XF), the rFLAGS bits are not updated. 

The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD 
floating-point invalid operation exception (#1) only when a source operand is an SNaN. 

The legacy and extended forms of the instruction operate in the same way. 


Instruction Support 


Form 

Subset 

Feature Flag 

UCOMISS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VUCOMISS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

UCOMISS xmml, xmm2/mem32 OF 2E /r Compares scalar single-precision floating-point values 

in xmml and xmm2 or mem64. Sets rFLAGS. 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VUCOMISS xmml, xmm2/mem32 C4 RXB.01 X.1111.X.00 2E/r 

Related Instructions 

(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




0 

M 

0 

M 

M 

21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: 

Note: 

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. 

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 















M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


UCOMISS, VUCOMISS 
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UNPCKHPD Unpack High 

VUNPCKHPD Double-Precision Floating-Point 

Unpacks the high-order double-precision floating-point values of the first and second source oper¬ 
ands and interleaves the values into the destination. Bits [63:0] of the source operands are ignored. 

Values are interleaved in ascending order from the lsb of the sources and the destination. Bits 
[127:64] of the first source are written to bits [63:0] of the destination; bits [127:64] of the second 
source are written to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated 
for bits [255:192] of the sources and bits [255:128] of the destination. 


There are legacy and extended forms of the instruction: 

UNPCKHPD 

Interleaves one pair of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The first source register is also the 
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VUNPCKHPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Interleaves one pair of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Interleaves two pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

UNPCKHPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VUNPCKHPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

UNPCKHPD xmml, xmm2lmem128 66 OF 15 /r 

Mnemonic 

VUNPCKHPD xmml, xmm2, xmm3/mem128 
VUNPCKHPD ymml, ymm2, ymm3/mem256 


Description 

Unpacks the high-order double-precision floating¬ 
point values in xmml and xmm2 or mem128 and 
interleaves them into xmml 

Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

C4 RXB.01 X.sref.0.01 15 /r 

C4 RXB.01 X.srcl.1.01 15/r 
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Related Instructions 

(V)UNPCKHPS, (V)UNPCKLPD, (V)UNPCKLPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


UNPCKHPD, VUNPCKHPD 
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UNPCKHPS Unpack High 

VUNPCKHPS Single-Precision Floating-Point 

Unpacks the high-order single-precision floating-point values of the first and second source operands 
and interleaves the values into the destination. Bits [63:0] of the source operands are ignored. 

Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [95:64] 
of the first source are written to bits [31:0] of the destination; bits [95:64] of the second source are 
written to bits [63:32] of the destination and so on, ending with bits [127:96] of the second source in 
bits [127:96] of the destination. For the 256-bit encoding, the process continues for bits [255:192] of 
the sources and bits [255:128] of the destination. 


There are legacy and extended fonns of the instruction: 

UNPCKHPS 

Interleaves two pairs of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The first source register is also the 
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VUNPCKHPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Interleaves two pairs of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Interleaves four pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

UNPCKHPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VUNPCKHPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic Opcode 

UNPCKHPS xmml, xmm2lmem128 OF 15 /r 


Mnemonic 

VUNPCKHPS xmml, xmm2, xmm3/mem128 
VUNPCKHPS ymml, ymm2, ymm3/mem256 


Description 

Unpacks the high-order single-precision floating-point 
values in xmml and xmm2 or mem128 and 
interleaves them into xmml 

Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

C4 RXB.01 X.srcl. 0.00 15/r 

C4 RXB.01 X.srcl. 1.00 15/r 


Related Instructions 

(V)UNPCKHPD, (V)UNPCKLPD, (V)UNPCKLPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] I = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


UNPCKHPS, VUNPCKHPS 
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UNPCKLPD Unpack Low 

VUNPCKLPD Double-Precision Floating-Point 

Unpacks the low-order double-precision floating-point values of the first and second source operands 
and interleaves the values into the destination. Bits [127:64] of the source operands are ignored. 

Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [63:0] 
of the first source are written to bits [63:0] of the destination; bits [63:0] of the second source are writ¬ 
ten to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated for bits 
[191:128] of the sources and bits [255:128] of the destination. 


There are legacy and extended fonns of the instruction: 

UNPCKLPD 

Interleaves one pair of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The first source register is also the 
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VUNPCKLPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Interleaves one pair of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Interleaves two pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

UNPCKLPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VUNPCKLPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode 

UNPCKLPD xmml, xmm2lmem128 66 OF 14 /r 

Mnemonic 

VUNPCKLPD xmml, xmm2, xmm3/mem128 
VUNPCKLPD ymml, ymm2, ymm3/mem256 


Description 

Unpacks the low-order double-precision floating-point 
values in xmml and xmm2 or mem128 and 
interleaves them into xmml 

Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.01 

X.srcf.0.01 

14 It 

C4 

RXB.01 

X.srcl. 1.01 

14 It 
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Related Instructions 

(V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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UNPCKLPS Unpack Low 

VUNPCKLPS Single-Precision Floating-Point 

Unpacks the low-order single-precision floating-point values of the first and second source operands 
and interleaves the values into the destination. Bits [127:64] of the source operands are ignored. 

Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [31:0] 
of the first source are written to bits [31:0] of the destination; bits [31:0] of the second source are writ¬ 
ten to bits [63:32] of the destination and so on, ending with bits [63:32] of the second source in bits 
[127:96] of the destination. For the 256-bit encoding, the process continues for bits [191:128] of the 
sources and bits [255:128] of the destination. 


There are legacy and extended fonns of the instruction: 

UNPCKLPS 

Interleaves two pairs of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The first source register is also the 
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 

VUNPCKLPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Interleaves two pairs of values. The first source operand is an XMM register and the second source 
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. 
Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

Interleaves four pairs of values. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM reg¬ 
ister. 

Instruction Support 


Form 

Subset 

Feature Flag 

UNPCKLPS 

SSE1 

CPUID Fn0000_0001_EDX[SSE] (bit 25) 

VUNPCKLPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Opcode Description 

UNPCKLPS xmml, xmm2/mem128 OF 14/r Unpacks the high-order single-precision floating-point 

values in xmml and xmm2 or mem128 and 
interleaves them into xmml 

Mnemonic Encoding 

VEX RXB.map select W.vvvv.L.pp Opcode 

RXB.01 X.srcl. 0.00 14/r 

RXB.01 X.srcT.I.OO 14/r 

Related Instructions 

(V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


VUNPCKLPS xmml, xmm2, xmm3/mem128 C4 

VUNPCKLPS ymml, ymm2, ymm3/mem256 C4 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


Instruction Reference 


UNPCKLPS, VUNPCKLPS 


593 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


VBROADCASTF128 Load With Broadcast 

From 128-bit Memory Location 

Loads double-precision floating-point data from a 128-bit memory location and writes it to the two 
128-bit elements of a YMM register 

This extended-form instruction has a single 256-bit encoding. 

The source operand is a 128-bit memory location. The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VBROADCASTF128 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VBROADCASTF128 ymml, mem128 C4 RXB.02 0.1111.1.01 lA/r 

Related Instructions 

VBROADCASTSD, VBROADCASTSS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 


Instruction Reference 
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VBROADCASTI128 Load With Broadcast Integer 

From 128-bit Memory Location 

Loads data from a 128-bit memory location and writes it to the two 128-bit elements of a YMM reg¬ 
ister 

There is a single form of this instruction: 

VBROADCAST1128 dest, mem128 
There is a single VEX.L = 1 encoding of this instruction. 

The source operand is a 128-bit memory location. The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VBROADCASTI128 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


Instruction Encoding 
Mnemonic 

VBROADCASTI128 ymml, mem128 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.02 0.1111.1.01 5A/r 


Related Instructions 

VBROADCASTF128, VEXTRACTF128, VEXTRACTI128, VINSERTF128, VINSERTI128 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 


596 


VBROADCASTI128 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

Register-based source operand specified (MODRM.mod = 11b) 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VBROADCASTSD Load With Broadcast Scalar Double 


Loads a double-precision floating-point value from a register or memory and writes it to the four 64- 
bit elements of a YMM register 

This extended-form instruction has a single 256-bit encoding. 

The source operand is the lower half of an XMM register or a 64-bit memory location. The destina¬ 
tion is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VBROADCASTSD ymml, mem64 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VBROADCASTSD ymml, xmm 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VBROADCASTSD ymml, xmm2/mem64 C4 RXB.02 0.1111.1.01 19 /r 

Related Instructions 

VBROADCASTF128, VBROADCASTSS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

Register-based source operand specified when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX, AVX2 exception. 
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VBROADCASTSS Load With Broadcast Scalar Single 


Loads a single-precision floating-point value from a register or memory and writes it to all 4 or 8 dou¬ 
blewords of an XMM or YMM register. 

This extended-form instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Copies the source operand to all four 32-bit elements of the destination. 

The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. 
The destination is an XMM register. 

YMM Encoding 

Copies the source operand to all eight 32-bit elements of the destination. 

The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. 
The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VBROADCASTSS mem32 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 

VBROADCASTSS xmm 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VBROADCASTSS xmml, xmm2/mem32 

C4 

RXB.02 

0.1111.0.01 

18/r 

VBROADCASTSS ymml, xmm2/mem32 

C4 

RXB.02 

0.1111.1.01 

18/r 


Related Instructions 

VBROADCASTF128, VBROADCASTSD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

MODRM.mod = 11b when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX, AVX2 exception. 
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VCVTPH2PS Convert Packed 16-Bit Floating-Point to 

Single-Precision Floating-Point 

Converts packed 16-bit floating point values to single-precision floating point values. 

A denormal source operand is converted to a normal result in the destination register. MXCSR.DAZ 
is ignored and no MXCSR denormal exception is reported. 

Because the full range of 16-bit floating-point encodings, including denormal encodings, can be rep¬ 
resented exactly in single-precision format, rounding, inexact results, and denonnalized results are 
not applicable. 

The operation of this instruction is illustrated in the following diagram. 
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This extended-form instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts four packed 16-bit floating-point values in the low-order 64 bits of an XMM register or in a 
64-bit memory location to four packed single-precision floating-point values and writes the converted 
values to an XMM destination register. When the result operand is written to the destination register, 
the upper 128 bits of the corresponding YMM register are zeroed. 
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YMM Encoding 

Converts eight packed 16-bit floating-point values in the low-order 128 bits of a YMM register or in a 
128-bit memory location to eight packed single-precision floating-point values and writes the con¬ 
verted values to a YMM destination register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VCVTPH2PS 

F16C 

CPUID Fn0000_0001_ECX[F16C] (bit 29) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 


VCVTPH2PS xmml, xmm2lmem64 
VCVTPH2PS ymml, xmm2lmem128 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.02 

0.1111.0.01 

13 It 

C4 

RXB.02 

0.1111.1.01 

13 It 


Related Instructions 

VCVTPS2PH 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 
















M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank. 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


AVX instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID 

F n0000_0001 _ECX[OSXS AVE]. 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

VEX.W field = 1. 



A 

VEX.vvvv ! = 1111b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Alignment check, #AC 



F 

Unaligned memory reference when alignment checking enabled. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

SIMD Floating-Point 
Exception, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid-operation exception 
(IE) 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized-operand 
exception (DE) 



F 

A source operand was a denormal value. 

Overflow exception (OE) 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow exception (UE) 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision exception (PE) 



F 

A result could not be represented exactly in the destination format. 

F — FI 6C exception. 
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VCVTPS2PH Convert Packed Single-Precision Floating-Point 

to 16-Bit Floating-Point 

Converts packed single-precision floating-point values to packed 16-bit floating-point values and 
writes the converted values to the destination register or to memory. An 8-bit immediate operand pro¬ 
vides dynamic control of rounding. 

The operation of this instruction is illustrated in the following diagram. 
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The handling of rounding is controlled by fields in the immediate byte, as shown in the following 
table. 


Rounding Control with Immediate Byte Operand 


Mnemonic 

Rounding 

Source 

(RS) 

Rounding Control 
(RC) 

Description 

Notes 

Bit 

2 

1 

0 

Value 

0 

0 

0 

Nearest 

Ignore MXCSR.RC. 

0 

1 

Down 

1 

0 

Up 

1 

1 

Truncate 

1 

X 

X 

Use MXCSR.RC for 
rounding. 



MXCSR[FTZ] has no effect on this instruction. Values within the half-precision denormal range are 
unconditionally converted to denormals. 

This extended-form instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Converts four packed single-precision floating-point values in an XMM register to four packed 16-bit 
floating-point values and writes the converted values to the low-order 64 bits of the destination XMM 
register or to a 64-bit memory location. When the result is written to the destination XMM register, 
the high-order 64 bits in the destination XMM register and the upper 128 bits of the corresponding 
YMM register are cleared to Os. 

YMM Encoding 

Converts eight packed single-precision floating-point values in a YMM register to eight packed 16- 
bit floating-point values and writes the converted values to the low-order 128 bits of a YMM register 
or to a 128-bit memory location. When the result is written to the destination YMM register, the high- 
order 128 bits in the register are cleared to Os. 

Instruction Support 


Form 

Subset 

Feature Flag 

VCVTPH2PH 

F16C 

CPUID Fn0000_0001_ECX[F16C] (bit 29) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic 

VCVTPS2PH xmm1lmem64, xmm2, imm8 
VCVTPS2PH xmm1lmem128 , ymm2, imm8 

Related Instructions 

VCVTPH2PS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank. 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.03 

0.1111.0.01 

ID /r/imm8 

C4 

RXB.03 

0.1111.1.01 

ID /r/imm8 
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Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


AVX instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID 

F n0000_0001 _ECX[OSXS AVE]. 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

VEX.W field = 1. 



A 

VEX.vvvv ! = 1111b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Alignment check, #AC 



F 

Unaligned memory reference when alignment checking enabled. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

SIMD Floating-Point 
Exception, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid-operation exception 
(IE) 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized-operand 
exception (DE) 



F 

A source operand was a denormal value. 

Overflow exception (OE) 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow exception (UE) 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision exception (PE) 



F 

A result could not be represented exactly in the destination format. 

F — FI 6C exception. 
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VEXTRACTF128 Extract 

Packed Floating-Point Values 

Extracts 128 bits of packed data from a YMM register as specified by an immediate byte operand, and 
writes it to either an XMM register or a 128-bit memory location. 

Only bit [0] of the immediate operand is used. Operation is as follows. 

• When imm8[0] = 0, copy bits [127:0] of the source to the destination. 

• When imm8[0] = 1, copy bits [255:128] of the source to the destination. 

This extended-form instruction has a single 256-bit encoding. 

The source operand is a YMM register and the destination is either an XMM register or a 128-bit 
memory location. There is a third immediate byte operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

VEXTRACTF128 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

VEX RXB.map_select W.vvvv.L.pp Opcode 

VEXTRACTF128 xmm/meml 28, ymm, imm8 C4 RXB.03 0.1111.1.01 19/rib 

Related Instructions 

VBROADCASTF128, VINSERTF128 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Write to a read-only data segment. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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VEXTRACTI128 Extract 128-bit Integer 

Writes a selected 128-bit half of a YMM register to an XMM register or a 128-bit memory location 
based on the value of bit 0 of an immediate byte. 

There is a single form of this instruction: 

VEXTRACTI128 dest, src, imm8 

If imm8[0] = 0, the lower half of the source YMM register is selected; if imm8[0] = 1, the upper half 
of the source register is selected. 

There is a single VEX.L = 1 encoding of this instruction. 

The source operand is a YMM register. The destination is either an XMM register or a 128-bit mem¬ 
ory location. When the destination is a register, bits [255:128] of the YMM register that corresponds 
to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VEXTRACTI128 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


Instruction Encoding 

Mnemonic 

VEXTRACTI128 xmm1/mem128, ymm2, imm8 


Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.03 0.1111.1.01 39/rib 


Related Instructions 

VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VINSERTF128, VINSERTI128 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VFMADDPD Multiply and Add 

VFMADD132PD Packed Double-Precision Floating-Point 

VFMADD213PD 
VFMADD231PD 


Multiplies together two double-precision floating-point vectors and adds the unrounded product to a 
third double-precision floating-point vector producing a precise result which is then rounded to dou¬ 
ble-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to 
the destination register. The role of each of the source operands specified by the assembly language 
prototypes given below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFMADDPD dest, srcl, src2/mem, src3 
VFMADDPD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMADD132PD srcl, src2, src3/mem 
VFMADD213PD srcl, src2, src3/mem 
VFMADD231PD srcl, src2, src3/mem 


II desf = (srcl*src2/mem) + src3 
II dest = (srcl* src2) + src3/mem 


II srcl = (srcl* src3/mem) + src2 
11 srcl = (src2* srcl) + src3/mem 
II srcl = (src2* src3/mem) + srcl 


When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMADDPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

69 

It 

/is4 

VFMADDPD ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.03 

O.srcV.1.01 

69 

Ir 

/is4 

VFMADDPD xmml, xmm2, xmm3, xmm4/mem128 

C4 

RXB.03 

l.srclO.OI 

69 

It 

/is4 

VFMADDPD ymml, ymm2, ymm3, ymm4/mem256 

C4 

RXB.03 

l.srcll.01 

69 

It 

/is4 

VFMADD132PD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.0.01 

98 

/r 


VFMADD132PD ymmO, ymml, ymm2/m256 

C4 

RXB.02 

1.src2.1.01 

98 

Ir 


VFMADD213PD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.0.01 

A8 

It 


VFMADD213PD ymmO, ymml, ymm2/m256 

C4 

RXB.02 

1.src2.1.01 

A8 

It 


VFMADD231PD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.0.01 

B8 

Ir 


VFMADD231PD ymmO, ymml, ymm2/m256 

C4 

RXB.02 

1.src2.1.01 

B8 

Ir 



Related Instructions 

VFMADDPS, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD, 
VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD132SS, 
VFMADD213SS, VFMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMADDPS Multiply and Add 

VFMADD132PS Packed Single-Precision Floating-Point 

VFMADD213PS 
VFMADD231PS 


Multiplies together two single-precision floating-point vectors and adds the unrounded product to a 
third single-precision floating-point vector producing a precise result which is then rounded to single¬ 
precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the 
destination register. The role of each of the source operands specified by the assembly language pro¬ 
totypes given below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFMADDPS dest, srcl, src2/mem, src3 
VFMADDPS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMADD132PS srcl, src2, src3/mem 
VFMADD213PS srcl, src2, src3/mem 
VFMADD231PS srcl, src2, src3/mem 


II desf = (srcl*src2/mem) + src3 
II desf = (srcl*src2) + src3/mem 


II srcl = (srcl* src3/mem) + src2 
II srcl = (src2* srcl) + src3/mem 
II srcl = (src2* src3/mem) + srcl 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDnnnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMADDPS xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

68 

ir 

/is4 

VFMADDPS ymml, ymm2, ymm3lmem256, ymm4 

C4 

RXB.03 

O.srcV.1.01 

68 

It 

/is4 

VFMADDPS xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

l.srclO.OI 

68 

/r 

/is4 

VFMADDPS ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

l.srcll.01 

68 

Ir 

/is4 

VFMADD132PS xmmO, xmml, xmm2/m128 

C4 

RXB.02 

0.src2.0.01 

98 

It 


VFMADD132PS ymmO, ymml, ymm2/m256 

C4 

RXB.02 

0.src2.1.01 

98 

Ir 


VFMADD213PS xmmO, xmml, xmm2/m128 

C4 

RXB.02 

0.src2.0.01 

A8 

It 


VFMADD213PS ymmO, ymml, ymm2/m256 

C4 

RXB.02 

0.src2.1.01 

A8 

It 


VFMADD231PS xmmO, xmml, xmm2/m128 

C4 

RXB.02 

0.src2.0.01 

B8 

Ir 


VFMADD231PS ymmO, ymml, ymm2/m256 

C4 

RXB.02 

0.src2.1.01 

B8 

It 



Related Instructions 

VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDSD, 
VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD 132SS, 
VFMADD213SS, VFMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMADDSD Multiply and Add 

VFMADD132SD Scalar Double-Precision Floating-Point 

VFMADD213SD 
VFMADD231SD 


Multiplies together two double-precision floating-point values and adds the unrounded product to a 
third double-precision floating-point value producing a precise result which is then rounded to dou¬ 
ble-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to 
the destination register. The role of each of the source operands specified by the assembly language 
prototypes given below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFMADDSD dest, srcl, src2/mem64, src3 
VFMADDSD dest, srcl, src2, src3/mem64 

and three three-operand forms: 

VFMADD132SD srcl, src2, src3/mem64 
VFMADD213SD srcl, src2, src3/mem64 
VFMADD231SD srcl, src2, src3/mem64 


II desf = (srcl* src2/mem64) + src3 
II dest = (srcl* src2) + src3/mem64 


II srcl = (srcl* src3/mem64) + src2 
11 srcl = (src2* srcl) + src3/mem64 
II srcl = (src2* src3/mem64) + srcl 


All 64-bit double-precision floating-point register-based operands are held in the lower quadword of 
XMM registers. The result is written to the lower quadword of the destination register. For those 
instructions that use a memory-based operand, one of the source operands is a 64-bit value read from 
memory. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a 64-bit memory location. 

The destination is an XMM register. When the result is written to the destination XMM register, bits 
[127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDSD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDnnnSD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMADDSD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srcl.X.OI 

6B/r /is4 

VFMADDSD xmml, xmm2, xmm3, xmm4/mem128 

C4 

RXB.03 

l.srcl.X.01 

6B Ir /is4 

VFMADD132SD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.X.01 

99 It 

VFMADD213SD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.X.01 

A9 It 

VFMADD231SD xmmO, xmml, xmm2/m128 

C4 

RXB.02 

1.src2.X.01 

B9/r 


Related Instructions 

VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS, 
VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSS, VFMADD 132SS, 
VFMADD213SS, VFMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMADDSS Multiply and Add 

VFMADD132SS Scalar Single-Precision Floating-Point 

VFMADD213SS 
VFMADD231SS 


Multiplies together two single-precision floating-point values and adds the unrounded product to a 
third single-precision floating-point value producing a precise result which is then rounded to single¬ 
precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the 
destination register. The role of each of the source operands specified by the assembly language pro¬ 
totypes given below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFMADDSS dest, srcl, src2/mem32, src3 
VFMADDSS dest, srcl, src2, src3/mem32 

and three three-operand forms: 

VFMADD132SS srcl, src2, src3/mem32 
VFMADD213SS srcl, src2, src3/mem32 
VFMADD231SS srcl, src2, src3/mem32 


II desf = (srcl* src2/mem32) + src3 
II desf = (srcl* src2) + src3/mem32 


II srcl = (srcl* src3/mem32) + src2 
II srcl = (src2* srcl) + src3/mem32 
II srcl = (src2* src3/mem32) + srcl 


All 32-bit single-precision floating-point register-based operands are held in the lower doubleword of 
XMM registers. The result is written to the low doubleword of the destination register. For those 
instructions that use a memory-based operand, one of the source operands is a 32-bit value read from 
memory. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a a register and the third source is either a register or a 32- 
bit memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a 32-bit memory location. 

The destination is an XMM register. When the result is written to the destination XMM register, bits 
[127:32] of the destination and bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDSS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDnnnSS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMADDSS xmml, xmm2, xmm3/mem32, xmm4 

C4 

RXB.03 

0.src7.X.01 

6A It /is4 

VFMADDSS xmml, xmm2, xmm3, xmm4lmem32 

C4 

RXB.03 

l.srclX.01 

6A It /is4 

VFMADD132SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

99 It 

VFMADD213SS xmml, xmm2, xmm3lmem32 

C4 

RXB.02 

0.src2.X.01 

A9 It 

VFMADD231SS xmml, xmm2, xmm3lmem32 

C4 

RXB.02 

0.src2.X.01 

B9/r 


Related Instructions 

VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS, 
VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD, VFMADD 132SD, 
VFMADD213SD, VFMADD231SD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Instruction Reference 


VFMADDSS, VFMADDnnnSS 


623 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMADDSUBPD Multiply with Alternating Add/Subtract 

VFMADDSUB132PD Packed Double-Precision Floating-Point 
VFMADDSUB213PD 
VFMADDSUB231PD 


I 

I 

I 


Multiplies together two double-precision floating-point vectors, adds odd elements of the unrounded 
product to odd elements of a third double-precision floating-point vector, and subtracts even elements 
of the third floating point vector from even elements of unrounded product. The precise result of each 
addition or subtraction is then rounded to double-precision based on the mode specified by the 
MXCSR[RC] field and written to the corresponding element of the destination. 

The role of each of the source operands specified by the assembly language prototypes given below is 
reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFMADDSUBPD dest, srcl, src2/mem, src3 
VFMADDSUBPD dest, srcl, src2, src3/mem 

and three three-operand fonns: 

VFMADDSUB132PD srcl, src2, src3/mem 

VFMADDSUB213PD srcl, src2, src3/mem 

VFMADDSUB231 PD srcl, src2, src3/mem 


II dest odd = (src1 odd * src2 odd /mem odd ) + src3 odd 
II dest even = (srcl 

even * src2 even /mem 

even ) src3 even 

II dest 0dd = ( src1 Odd* src2 0dd) + src3 0dd /mem 0dd 
// d &st even ~ (srcl even src2 even ) — src3 even /mem even 


II srcl odd = (srcl odd * src3 odd /mem odd ) + src2 odd 
// srcl even _ (srcl even src3 even /mem even ) — src2 even 
II srcl odd = (src2 0dd * src1 odd ) + src3 odd /mem odd 
// srcl even — (src2 even srcl even ) — src3 even /mem even 
II srcl odd = (src2 0dd * src3 odd /mem odd ) + src1 odd 
// srcl even - (src2 even src3 even /mem even ) — srcl even 


When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 
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Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDSUBPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDSUBnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VFMADDSUBPD xmml, xmm2, xmm3/mem128, xmm4 
VFMADDSUBPD ymml, ymm2, ymm3lmem256, ymm4 
VFMADDSUBPD xmml, xmm2, xmm3, xmm4lmem128 
VFMADDSUBPD ymml, ymm2, ymm3, ymm4lmem256 
VFMADDSUB132PD xmml, xmm2, xmm3/mem128 
VFMADDSUB132PD ymml, ymm2, ymm3/mem256 
VFMADDSUB213PD xmml, xmm2, xmm3/mem128 
VFMADDSUB213PD ymml, ymm2, ymm3/mem256 
VFMADDSUB231PD xmml, xmm2, xmm3/mem128 
VFMADDSUB231PD ymml, ymm2, ymm3/mem256 

Related Instructions 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.03 

O.srcf.O.OI 

5D /r/is4 

C4 

RXB.03 

O.srcll.OI 

5D /r/is4 

C4 

RXB.03 

l.srct.0.01 

5D /r/is4 

C4 

RXB.03 

l.srcll.01 

5D /r/is4 

C4 

RXB.02 

1.src2.0.01 

96 It 

C4 

RXB.02 

1.src2.1.01 

96 /r 

C4 

RXB.02 

1.src2.0.01 

A6 /r 

C4 

RXB.02 

1.src2.1.01 

A6 It 

C4 

RXB.02 

1.src2.0.01 

B6/r 

C4 

RXB.02 

1.src2.1.01 

B6/r 


VFMSUBADDPD, VFMSUBADD 132PD, VFMSUBADD213PD, VFMSUBADD231PD, 
VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB23 IPS, VFMSUB- 
ADDPS, VFMSUBADD132PS, VFMSUBADD2BPS, VFMSUBADD23 IPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMADDSUBPS Multiply with Alternating Add/Subtract 

VFMADDSUB132PS Packed Single-Precision Floating-Point 
VFMADDSUB213PS 
VFMADDSUB231 PS 


I 

I 

I 


Multiplies together two single-precision floating-point vectors, adds odd elements of the unrounded 
product to odd elements of a third single-precision floating-point vector, and subtracts even elements 
of the third floating point vector from even elements of unrounded product. The precise result of each 
addition or subtraction is then rounded to single-precision based on the mode specified by the 
MXCSR[RC] field and written to the corresponding element of the destination. 

The role of each of the source operands specified by the assembly language prototypes given below is 
reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFMADDSUBPS dest, srcl, src2/mem, src3 
VFMADDSUBPS dest, srcl, src2, src3/mem 

and three three-operand fonns: 

VFMADDSUB132PS srcl, src2, src3/mem 

VFMADDSUB213PS srcl, src2, src3/mem 

VFMADDSUB231 PS srcl, src2, src3/mem 


II dest odd = (src1 odd * src2 odd /mem odd ) + src3 odd 
II dest even = (srcl 

even * src2 even /mem 

even ) src3 even 

II dest 0dd = ( src1 Odd* src2 0dd) + src3 0dd /mem 0dd 
// d©sf e yg n - (srcl even src2 even ) — src3 even /mem even 


II src1 0dd = (srcl odd * src3 odd /mem odd ) + src2 odd 
it srcl even - (srcl even src3 even /mem even ) — src2 even 
II srcl odd = (src2 0dd * src1 odd ) + src3 odd /mem odd 
// src 1 even ~ (src2 eV en srcl ev en) ~ src3 ev ei-/rnem e ven 

H src1 0dd = (src2 0dd * src3 odd /mem odd ) + src1 odd 
// src 1 even ~ (src2 even src3 even /mem even ) — srcl even 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 
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Instruction Support 


Form 

Subset 

Feature Flag 

VFMADDSUBPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMADDSUBnnnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VFMADDSUBPS xmml, xmm2, xmm3lmem128, xmm4 
VFMADDSUBPS ymml, ymm2, ymm3lmem256, ymm4 
VFMADDSUBPS xmml, xmm2, xmm3, xmm4lmem128 
VFMADDSUBPS ymml, ymm2, ymm3, ymm4lmem256 
VFMADDSUB132PS xmml, xmm2, xmm3/mem128 
VFMADDSUB132PS ymml, ymm2, ymm3/mem256 
VFMADDSUB213PS xmml, xmm2, xmm3/mem128 
VFMADDSUB213PS ymml, ymm2, ymm3/mem256 
VFMADDSUB231PS xmml, xmm2, xmm3/mem128 
VFMADDSUB231PS ymml, ymm2, ymm3/mem256 

Related Instructions 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.03 

O.srclO.OI 

5C /r /is4 

C4 

RXB.03 

O.srcll.01 

5C /r /is4 

C4 

RXB.03 

l.srclO.OI 

5C /r /is4 

C4 

RXB.03 

1.src7.1.01 

5C /r /is4 

C4 

RXB.02 

0.src2.0.01 

96 /r 

C4 

RXB.02 

0.src2.1.01 

96 /r 

C4 

RXB.02 

0.src2.0.01 

A6 It 

C4 

RXB.02 

0.src2.1.01 

A6 It 

C4 

RXB.02 

0.src2.0.01 

B6/r 

C4 

RXB.02 

0.src2.1.01 

B6/r 


VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFM- 
SUBADDPD, VFMSUBADD 132PD, VFMSUBADD213PD, VFMSUBADD231PD, VFMSUBAD- 
DPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD23 IPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBADDPD Multiply with Alternating Subtract/Add 

VFMSUBADD132PD Packed Double-Precision Floating-Point 
VFMSUBADD213PD 
VFMSUBADD231PD 

Multiplies together two double-precision floating-point vectors, adds even elements of the unrounded 
product to even elements of a third double-precision floating-point vector, and subtracts odd elements 
of the third floating point vector from odd elements of unrounded product. The precise result of each 
addition or subtraction is then rounded to double-precision based on the mode specified by the 
MXCSR[RC] field and written to the corresponding element of the destination. 

The role of each of the source operands specified by the assembly language prototypes given below is 
reflected in the equation in the comment on the right. 

There are two four-operand forms: 

VFMSUBADDPD dest, srcl, src2/mem, src3 II dest odd - (srcl odd * src2 odd /mem odd ) - src3 odd 

H dest even = (srcl 

even * src2 even /mem 

even ) + src3 even 

VFMSUBADDPD dest, srcl, src2, src3/mem II dest odd = (srcl odd * src2 odd ) - src3 odd /mem odd 

II dest even — (srcl even src2 even ) + src3 even /mem even 

and three three-operand fonns: 

| VFMSUBADD132PD srcl, src2, src3/mem II srcl odd = (src1 odd * src3 odd /mem odd ) - src2 odd 

// srcl even - (srcl even src3 even /mem even ) + src2 even 

jj VFMSUBADD213PD srcl, src2, src3/mem // srcl odd = (src2 odd * src1 odd ) - src3 odd /mem odd 

// srcl even - (src2 even srcl even ) + src3 even /mem even 

j VFMSUBADD231 PD srcl, src2, src3/mem // srcl odd = (src2 odd * src3 odd /mem odd ) - src1 odd 

// srcl even — (src2 even src3 even /mem even ) + srcl even 

For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For 
VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source operand is either a register 
or a memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 

Instruction Support 

Form Subset 


VFMSUBADDPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBADDnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 
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For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBADDPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srct.O.OI 

5F 

/ r /is4 

VFMSUBADDPD ymml, ymm2, ymm3lmem256, ymm4 

C4 

RXB.03 

O.srclA.O) 

5F 

/r /is4 

VFMSUBADDPD xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

l.srcV.0.01 

5F 

/r /is4 

VFMSUBADDPD ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

).src1A.O) 

5F 

/r /is4 

VFMSUBADD132PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

97 

/r 

VFMSUBADD132PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

97 

/r 

VFMSUBADD213PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

A7 

/r 

VFMSUBADD213PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

A7 

/r 

VFMSUBADD231PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

B7 

/r 

VFMSUBADD231PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

B7 

/r 


Related Instructions 

VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, 
VFMADDSUBPS, VFMADDSUB 132PS, VFMADDSUB213PS, VFMADDSUB23 IPS, VFMSUB- 
ADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD23 IPS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBADDPS Multiply with Alternating Subtract/Add 

VFMSUBADD132PS Packed Single-Precision Floating-Point 
VFMSUBADD213PS 
VFMSUBADD231 PS 


I 

I 

I 


Multiplies together two single-precision floating-point vectors, adds even elements of the unrounded 
product to even elements of a third single-precision floating-point vector, and subtracts odd elements 
of the third floating point vector from odd elements of unrounded product. The precise result of each 
addition or subtraction is then rounded to single-precision based on the mode specified by the 
MXCSR[RC] field and written to the corresponding element of the destination. 

The role of each of the source operands specified by the assembly language prototypes given below is 
reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFMSUBADDPS dest, srcl, src2/mem, src3 
VFMSUBADDPS dest, srcl, src2, src3/mem 

and three three-operand fonns: 

VFMSUBADD132PS srcl, src2, src3/mem 

VFMSUBADD213PS srcl, src2, src3/mem 

VFMSUBADD231 PS srcl, src2, src3/mem 


II dest odd = (src1 odd * src2 odd /mem odd ) - src3 odd 
II dest even = (srcl 

even * src2 even /mem 

even ) + src3 even 

II dest 0dd = ( src1 Odd* src2 0dd) - src3 0dd /mem 0dd 
// dest even - (srcl even src2 even ) + src3 even /mem even 


II src1 0dd = (srcl odd * src3 odd /mem odd ) - src2 odd 
II srcl even - (srcl even src3 even /mem even ) + src2 even 
II srcl odd = (src2 0dd * src1 odd ) - src3 odd /mem odd 
// src 1 even ~ (src2 eV en srcl ev en) + src3 even /A7ieA7? ev , ef7 
H src1 0dd = (src2 0dd * src3 odd /mem odd ) - src1 odd 
// src 1 even ~ fsrc2g l/ef1 src3 even /mem even ) + srcl even 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


634 


VFMSUBADDPS, VFMSUBADDnnnPS 


Instruction Reference 



AMDS 

26568 — Rev. 3.23—February 2019 AMD64 Technology 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMSUBADDPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBADDnnnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VFMSUBADDPS xmml, xmm2, xmm3/mem128, xmm4 
VFMSUBADDPS ymml, ymm2, ymm3lmem256, ymm4 
VFMSUBADDPS xmml, xmm2, xmm3, xmm4lmem128 
VFMSUBADDPS ymml, ymm2, ymm3, ymm4lmem256 
VFMSUBADD132PS xmml, xmm2, xmm3/mem128 
VFMSUBADD132PS ymml, ymm2, ymm3/mem256 
VFMSUBADD213PS xmml, xmm2, xmm3/mem128 
VFMSUBADD213PS ymml, ymm2, ymm3/mem256 
VFMSUBADD231PS xmml, xmm2, xmm3/mem128 
VFMSUBADD231PS ymml, ymm2, ymm3/mem256 

Related Instructions 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.03 

O.srct.O.OI 

5E /r /is4 

C4 

RXB.03 

O.srcll.01 

5E /r /is4 

C4 

RXB.03 

l.srct.0.01 

5E /r /is4 

C4 

RXB.03 

l.srcll.01 

5E /r /is4 

C4 

RXB.00010 

0.src2.0.01 

97 /r 

C4 

RXB.00010 

0.src2.1.01 

97 /r 

C4 

RXB.00010 

0.src2.0.01 

A7 /r 

C4 

RXB.00010 

0.src2.1.01 

A7 /r 

C4 

RXB.00010 

0.src2.0.01 

B7/r 

C4 

RXB.00010 

0.src2.1.01 

B7/r 


VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, 
VFMADDSUBPS, VFMADDSUB 132PS, VFMADDSUB213PS, VFMADDSUB23 IPS, VFMSUB- 
ADDPD, VFMSUBADD 132PD, VFMSUBADD213PD, VFMSUBADD231PD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBPD Multiply and Subtract 

VFMSUB132PD Packed Double-Precision Floating-Point 

VFMSUB213PD 
VFMSUB231PD 


Multiplies together two double-precision floating-point vectors and subtracts a third double-precision 
floating-point vector from the unrounded product to produce a precise intermediate result. The inter¬ 
mediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC] 
field and written to the destination register. The role of each of the source operands specified by the 
assembly language prototypes given below is reflected in the vector equation in the comment on the 
right. 

There are two four-operand forms: 


VFMSUBPD dest, srcl, src2/mem, src3 
VFMSUBPD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMSUB132PD srcl, src2, src3/mem 
VFMSUB213PD srcl, src2, src3/mem 
VFMSUB231PD srcl, src2, src3/mem 


II desf = (srcl*src2/mem) - src3 
II desf = (srcl*src2) - src3/mem 


II srcl = (srcl* src3/mem) - src2 
II srcl = (src2* srcl) - src3/mem 
II srcl = (src2* src3/mem) - srcl 


For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For 
VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMSUBPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

6D /r/is4 

VFMSUBPD ymml, ymm2, ymm3lmem256, ymm4 

C4 

RXB.03 

0.src7.1.01 

6D /r/is4 

VFMSUBPD xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

1.src7.0.01 

6D /r/is4 

VFMSUBPD ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

1.src7.1.01 

6D /r/is4 

VFMSUB132PD xmml, xmm2, xmm3lmem128 

C4 

RXB.02 

1.src2.0.01 

9A /r 

VFMSUB132PD ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

1.src2.1.01 

9A /r 

VFMSUB213PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

AA/r 

VFMSUB213PD ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

1.src2.1.01 

AA/r 

VFMSUB231PD xmml, xmm2, xmm3lmem128 

C4 

RXB.02 

1.src2.0.01 

BA /r 

VFMSUB231PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

BA /r 


Related Instructions 

VFMSUBPS, VFMSUB132PS, VFMSUB213PS, VFMSUB231PPS, VFMSUBSD, 
VFMSUB132SD, VFMSUB213SD, VFMSUB2P31SD, VFMSUBSS, VFMSUB132SS, 
VFMSUB213SS, VFMSUBP231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBPS Multiply and Subtract 

VFMSUB132PS Packed Single-Precision Floating-Point 

VFMSUB213PS 
VFMSUB231PS 


Multiplies together two single-precision floating-point vectors and subtracts a third single-precision 
floating-point vector from the unrounded product to produce a precise intermediate result. The inter¬ 
mediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC] 
field and written to the destination register. The role of each of the source operands specified by the 
assembly language prototypes given below is reflected in the vector equation in the comment on the 
right. 

There are two four-operand forms: 


VFMSUBPS dest, srcl, src2/mem, src3 
VFMSUBPS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMSUB132PS srcl, src2, src3/mem 
VFMSUB213PS srcl, src2, src3/mem 
VFMSUB231PS srcl, src2, src3/mem 


II desf = (srcl*src2/mem) - src3 
II desf = (srcl*src2) - src3/mem 


II srcl = (srcl* src3/mem) - src2 
II srcl = (src2* srcl) - src3/mem 
II srcl = (src2* src3/mem) - srcl 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMSUBPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBmnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBPS xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

6C 

/ r 

/is4 

VFMSUBPS ymml, ymm2, ymm3lmem256, ymm4 

C4 

RXB.03 

O.srcll.OI 

6C 

It 

/is4 

VFMSUBPS xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

1.src7.0.01 

6C 

It 

/is4 

VFMSUBPS ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

1.src7.1.01 

6C 

It 

/is4 

VFMSUB132PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

9A 

It 


VFMSUB132PS ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

0.src2.1.01 

9A 

It 


VFMSUB213PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

AA 

It 


VFMSUB213PS ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

0.src2.1.01 

AA 

It 


VFMSUB231PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

BA 

It 


VFMSUB231PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

BA 

It 



Related Instructions 

VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBSD, 
VFMSUB132SD, VFMSUB213SD, VFMSUB231SD, VFMSUBSS, VFMSUB132SS, 
VFMSUB213SS, VFMSUB231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBSD Multiply and Subtract 

VFMSUB132SD Scalar Double-Precision Floating-Point 

VFMSUB213SD 
VFMSUB231SD 


Multiplies together two double-precision floating-point values and subtracts a third double-precision 
floating-point value from the unrounded product to produce a precise intermediate result. The inter¬ 
mediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC] 
field and written to the destination register. The role of each of the source operands specified by the 
assembly language prototypes given below is reflected in the vector equation in the comment on the 
right. 

There are two four-operand forms: 


VFMSUBSD dest, srcl, src2/mem, src3 
VFMSUBSD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMSUB132SD srcl, src2, src3/mem 
VFMSUB213SD srcl, src2, src3/mem 
VFMSUB231SD srcl, src2, src3/mem 


II desf = (srcl*src2/mem) - src3 
II desf = (srcl*src2) - src3/mem 


II srcl = (srcl* src3/mem) - src2 
II srcl = (src2* srcl) - src3/mem 
II srcl = (src2* src3/mem) - srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or 64-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is a register or 64-bit 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is an XMM register. When the result is written to the destination XMM register, bits 
[127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMSUBSD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBnnnSD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBSD xmml, xmm2, xmm3/mem64, xmm4 

C4 

RXB.03 

0.src7.X.01 

6F 

/r/is4 

VFMSUBSD xmml, xmm2, xmm3, xmm4lmem64 

C4 

RXB.03 

1.src7.X.01 

6F 

/r/is4 

VFMSUB132SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

9B 

/r 

VFMSUB213SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

AB 

It 

VFMSUB231SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

BB 

Ir 


Related Instructions 

VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS, 
VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSS, VFMSUB132SS, 
VFMSUB213SS, VFMSUB231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFMSUBSS Multiply and Subtract 

VFMSUB132SS Scalar Single-Precision Floating-Point 

VFMSUB213SS 
VFMSUB231SS 


Multiplies together two single-precision floating-point values and subtracts a third single-precision 
floating-point value from the unrounded product to produce a precise intermediate result. The inter¬ 
mediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC] 
field and written to the destination register. The role of each of the source operands specified by the 
assembly language prototypes given below is reflected in the vector equation in the comment on the 
right. 

There are two four-operand forms: 


VFMSUBSS dest, srcl, src2/mem, src3 
VFMSUBSS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFMSUB132SS srcl, src2, src3/mem 
VFMSUB213SS srcl, src2, src3/mem 
VFMSUB231SS srcl, src2, src3/mem 


II desf = (srcl*src2/mem) - src3 
II desf = (srcl*src2) - src3/mem 


II srcl = (srcl* src3/mem) - src2 
II srcl = (src2* srcl) - src3/mem 
II srcl = (src2* src3/mem) - srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or 32-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is a register or 32-bit 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is an XMM register. When the result is written to the destination XMM register, bits 
[127:32] of the XMM register and bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFMSUBSS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFMSUBnnnSS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFMSUBSS xmml, xmm2, xmm3/mem32, xmm4 

C4 

RXB.03 

0.src7.X.01 

6E /r 

/is4 

VFMSUBSS xmml, xmm2, xmm3, xmm4lmem32 

C4 

RXB.03 

1.src7.X.01 

6E /r 

/is4 

VFMSUB132SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

9B 

/r 

VFMSUB213SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

AB 

/r 

VFMSUB231SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

BB 

/r 


Related Instructions 

VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS, 
VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSD, VFMSUB132SD, 
VFMSUB213SD, VFMSUB231SD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 
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15 
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12 
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0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMADDPD Negative Multiply and Add 

VFNMADD132PD Packed Double-Precision Floating-Point 

VFNMADD213PD 
VFNMADD231PD 


Multiplies together two double-precision floating-point vectors, negates the unrounded product, and 
adds it to a third double-precision floating-point vector. The precise result is then rounded to double¬ 
precision based on the mode specified by the MXCSR[RC] field and written to the destination regis¬ 
ter. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFNMADDPD dest, srcl, src2/mem, src3 
VFNMADDPD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMADD132PD srcl, src2, src3/mem 
VFNMADD213PD srcl, src2, src3/mem 
VFNMADD231PD srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) + src3 
II dest = -(srcl* src2) + src3/mem 


II srcl = -(srcl* src3/mem) + src2 
11 srcl = ~(src2* srcl) + src3/mem 
II srcl = ~(src2* src3/mem) + srcl 


When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFNMADDPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMADDnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMADDPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

79 /r/is4 

VFNMADDPD ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.03 

O.srcll.01 

79 /r/is4 

VFNMADDPD xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

l.srclO.OI 

79 /r/is4 

VFNMADDPD ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

l.srcll.01 

79 /r/is4 

VFNMADD132PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

9C /r 

VFNMADD132PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

9C /r 

VFNMADD213PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

AC /r 

VFNMADD213PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

AC /r 

VFNMADD231PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

BC/r 

VFNMADD231 PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

BC/r 


Related Instructions 

VFNMADDPS, VFNMADD 132PS, VFNMADD213PS, VFNMADD23 IPS, VFNMADDSD, 
VFNMADD132SD, VFNMADD213 SD, VFNMADD231SD, VFNM ADDS S, VFNM ADD 13 2 S S, 
VFNMADD213SS, VFNMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMADDPS Negative Multiply and Add 

VFNMADD132PS Packed Single-Precision Floating-Point 

VFNMADD213PS 
VFNMADD231PS 


Multiplies together two single-precision floating-point vectors, negates the unrounded product, and 
adds it to a third single-precision floating-point vector. The precise result is then rounded to single¬ 
precision based on the mode specified by the MXCSR[RC] field and written to the destination regis¬ 
ter. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFNMADDPS dest, srcl, src2/mem, src3 
VFNMADDPS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMADD132PS srcl, src2, src3/mem 
VFNMADD213PS srcl, src2, src3/mem 
VFNMADD231PS srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) + src3 
II desf = -(srcl* src2) + src3/mem 


II srcl = -(srcl* src3/mem) + src2 
II srcl = ~(src2* srcl) + src3/mem 
II srcl = ~(src2* src3/mem) + srcl 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFNMADDPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMADDnnnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMADDPS xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

78 /r/is4 

VFNMADDPS ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.03 

O.srcV.1.01 

78 /r/is4 

VFNMADDPS xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

l.srclO.OI 

78 /r/is4 

VFNMADDPS ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

l.srcll.01 

78 /r/is4 

VFNMADD132PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

9C/r 

VFNMADD132PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

9C/r 

VFNMADD213PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

AC/r 

VFNMADD213PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

AC/r 

VFNMADD231PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

BC/r 

VFNMADD231 PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

BC/r 


Related Instructions 

VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDSD, 
VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADDSS, VFNMADD 132SS, 
VFNMADD213SS, VFNMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 
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15 
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12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMADDSD Negative Multiply and Add 

VFNMADD132SD Scalar Double-Precision Floating-Point 

VFNMADD213SD 
VFNMADD231SD 


Multiplies together two double-precision floating-point values, negates the unrounded product, and 
adds it to a third double-precision floating-point value. The precise result is then rounded to double¬ 
precision based on the mode specified by the MXCSR[RC] field and written to the destination regis¬ 
ter. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFNMADDSD dest, srcl, src2/mem, src3 
VFNMADDSD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMADD132SD srcl, src2, src3/mem 
VFNMADD213SD srcl, src2, src3/mem 
VFNMADD231SD srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) + src3 
II dest = -(srcl* src2) + src3/mem 


II srcl = -(srcl* src3/mem) + src2 
II srcl = ~(src2* srcl) + src3/mem 
II srcl = ~(src2* src3/mem) + srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or 64-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is a register or 64-bit 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a 64-bit memory location. 

The destination is an XMM register. When the result is written to the destination, bits [127:64] of the 
XMM register and bits [255:128] of the corresponding YMM register are cleared. 


instruction Support 


Form 

Subset 

Feature Flag 

VFNMADDSD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMADDnnnSD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMADDSD xmml, xmm2, xmm3/mem64, xmm4 

C4 

RXB.03 

0.src7.X.01 

7B /r /is4 

VFNMADDSD xmml, xmm2, xmm3, xmm4/mem64 

C4 

RXB.03 

1.src7.X.01 

7B It /is4 

VFNMADD132SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

^.src2.X.0^ 

9D It 

VFNMADD213SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

AD It 

VFNMADD231SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.s/-c2.X.01 

BD It 


Related Instructions 

VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS, 
VFNMADD132PS, VFNMADD213PS, VFNMADD23 IPS, VFNMADDSS, VFNMADD 132SS, 
VFNMADD213SS, VFNMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 
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15 
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12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMADDSS Negative Multiply and Add 

VFNMADD132SS Scalar Single-Precision Floating-Point 

VFNMADD213SS 
VFNMADD231SS 


Multiplies together two single-precision floating-point values, negates the unrounded product, and 
adds it to a third single-precision floating-point value. The precise result is then rounded to single¬ 
precision based on the mode specified by the MXCSR[RC] field and written to the destination regis¬ 
ter. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFNMADDSS dest, srcl, src2/mem, src3 
VFNMADDSS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMADD132SS srcl, src2, src3/mem 
VFNMADD213SS srcl, src2, src3/mem 
VFNMADD231SS srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) + src3 
II desf = -(srcl* src2) + src3/mem 


II srcl = -(srcl* src3/mem) + src2 
II srcl = ~(src2* srcl) + src3/mem 
II srcl = ~(src2* src3/mem) + srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or 32-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is a register or 32-bit 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a 32-bit memory location. 

The destination is an XMM register. When the result is written to the destination, bits [127:32] of the 
XMM register and bits [255:128] of the corresponding YMM register are cleared. 


instruction Support 


Form 

Subset 

Feature Flag 

VFNMADDSS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMADDnnnSS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VFNMADDSS xmml, xmm2, xmm3/mem32, xmm4 

C4 

RXB.03 

O.srcl.X.OI 

7 A Ir /is4 

VFNMADDSS xmml, xmm2, xmm3, xmm4lmem32 

C4 

RXB.03 

l.srcl.X.01 

7A It /is4 

VFNMADD132SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

9D It 

VFNMADD213SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

AD It 

VFNMADD231SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

BD It 


Related Instructions 

VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS, 
VFNMADD132PS, VFNMADD213PS, VFNMADD23 IPS, VFNMADDSS, VFNMADD 132SS, 
VFNMADD213SS, VFNMADD231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMSUBPD Negative Multiply and Subtract 

VFNMSUB132PD Packed Double-Precision Floating-Point 

VFNMSUB213PD 
VFNMSUB231PD 


Multiplies together two double-precision floating-point vectors, negates the unrounded product, and 
subtracts a third double-precision floating-point vector from it. The precise result is then rounded to 
double-precision based on the mode specified by the MXCSR[RC] field and written to the destination 
register. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFNMSUBPD dest, srcl, src2/mem, src3 
VFNMSUBPD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMSUB132PD srcl, src2, src3/mem 
VFNMSUB213PD srcl, src2, src3/mem 
VFNMSUB231PD srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) - src3 
II desf = -(srcl* src2) - src3/mem 


II srcl = -(srcl* src3/mem) - src2 
11 srcl = ~(src2* srcl) - src3/mem 
H srcl = ~(src2* src3/mem) - srcl 


When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFNMSUBPD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMSUBnnnPD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMSUBPD xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srcFO.OI 

7D 

/r/is4 

VFNMSUBPD ymml, ymm2, ymm3lmem256, ymm4 

C4 

RXB.03 

0.src7.1.01 

7D 

/r/is4 

VFNMSUBPD xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

1.src7.0.01 

7D 

/r/is4 

VFNMSUBPD ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

l.srcf.1.01 

7D 

/r/is4 

VFNMSUB132PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

'I.src2.0.01 

9E 

/r 

VFNMSUB132PD ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

1.src2.1.01 

9E 

/r 

VFNMSUB213PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

AE 

/r 

VFNMSUB213PD ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

1.src2.1.01 

AE 

/r 

VFNMSUB231PD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1.src2.0.01 

BE 

/r 

VFNMSUB231PD ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

1.src2.1.01 

BE 

/r 


Related Instructions 

VFNMSUBPS, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB23 IPS, VFNMSUBSD, 
VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB 132SS, 
VFNMSUB213SS, VFNMSUB231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 
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M 
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15 
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1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMSUBPS Negative Multiply and Subtract 

VFNMSUB132PS Packed Single-Precision Floating-Point 

VFNMSUB213PS 
VFNMSUB231PS 


Multiplies together two single-precision floating-point vectors, negates the unrounded product, and 
subtracts a third single-precision floating-point vector from it. The precise result is then rounded to 
single-precision based on the mode specified by the MXCSR[RC] field and written to the destination 
register. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the vector equation in the comment on the right. 

There are two four-operand forms: 


VFNMADDPS dest, srcl, src2/mem, src3 
VFNMADDPS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMADD132PS srcl, src2, src3/mem 
VFNMADD213PS srcl, src2, src3/mem 
VFNMADD231PS srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) - src3 
II desf = -(srcl* src2) - src3/mem 


II srcl = -(srcl* src3/mem) - src2 
II srcl = -(src2* srcl) - src3/mem 
II srcl = ~(src2* src3/mem) - srcl 


When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and register- 
based source operands are held in XMM registers. 

When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and register- 
based source operands are held in YMM registers. 

For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a memory location and the third source 
is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a memory location. 

The destination is either an XMM register or a YMM register, as determined by VEX.L. When the 
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are 
cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFNMSUBPS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMSUBnnnPS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMSUBPS xmml, xmm2, xmm3/mem128, xmm4 

C4 

RXB.03 

O.srclO.OI 

7C 

Ir /is4 

VFNMSUBPS ymml, ymm2, ymm3/mem256, ymm4 

C4 

RXB.03 

0.src7.1.01 

7C 

It /is4 

VFNMSUBPS xmml, xmm2, xmm3, xmm4lmem128 

C4 

RXB.03 

l.srclO.OI 

7C 

Ir /is4 

VFNMSUBPS ymml, ymm2, ymm3, ymm4lmem256 

C4 

RXB.03 

1.src7.1.01 

7C 

It /is4 

VFNMSUB132PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

9E 

It 

VFNMSUB132PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

9E 

It 

VFNMSUB213PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

AE 

Ir 

VFNMSUB213PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

AE 

Ir 

VFNMSUB231PS xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

0.src2.0.01 

BE 

It 

VFNMSUB231PS ymml, ymm2, ymm3/mem256 

C4 

RXB.02 

0.src2.1.01 

BE 

Ir 


Related Instructions 

VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBSD, 
VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB 132SS, 
VFNMSUB213SS, VFNMSUB231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 
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M 
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0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMSUBSD Negative Multiply and Subtract 

VFNMSUB132SD Scalar Double-Precision Floating-Point 

VFNMSUB213SD 
VFNMSUB231SD 


Multiplies together two double-precision floating-point values, negates the unrounded product, and 
subtracts a third double-precision floating-point value from it. The precise result is then rounded to 
double-precision based on the mode specified by the MXCSR[RC] field and written to the destination 
register. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFNMSUBSD dest, srcl, src2/mem, src3 
VFNMSUBSD dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMSUB132SD srcl, src2, src3/mem 
VFNMSUB213SD srcl, src2, src3/mem 
VFNMSUB231SD srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) - src3 
II desf = -(srcl* src2) - src3/mem 


II srcl = -(srcl* src3/mem) - src2 
11 srcl = ~(src2* srcl) - src3/mem 
H srcl = ~(src2* src3/mem) - srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit 
memory location. 

For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third 
operand is either a register or a 64-bit memory location. 

The destination is an XMM register. Bits [127:64] of the destination XMM register and bits [255:128] 
of the corresponding YMM register are cleared. 


instruction Support 


Form 

Subset 

Feature Flag 

VFNMSUBSD 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMSUBnnnSD 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFNMSUBSD xmml, xmm2, xmm3/mem64, xmm4 

C4 

RXB.03 

0.src7.X.01 

7F 

/r/is4 

VFNMSUBSD xmml, xmm2, xmm3, xmm4lmem64 

C4 

RXB.03 

1.src7.X.01 

7F 

/r/is4 

VFNMSUB132SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

9F 

/r 

VFNMSUB213SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

AF 

/r 

VFNMSUB231SD xmml, xmm2, xmm3/mem64 

C4 

RXB.02 

1.src2.X.01 

BF 

/r 


Related Instructions 

VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS, 
VFNMSUB132PS, VFNMSUB213PS, VFNMSUB23 IPS, VFNMSUBSS, VFNMSUB 132SS, 
VFNMSUB213SS, VFNMSUB231SS 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 
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Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFNMSUBSS Negative Multiply and Subtract 

VFNMSUB132SS Scalar Single-Precision Floating-Point 

VFNMSUB213SS 
VFNMSUB231SS 


Multiplies together two single-precision floating-point values, negates the unrounded product, and 
subtracts a third single-precision floating-point value from it. The precise result is then rounded to 
single-precision based on the mode specified by the MXCSR[RC] field and written to the destination 
register. The role of each of the source operands specified by the assembly language prototypes given 
below is reflected in the equation in the comment on the right. 

There are two four-operand forms: 


VFNMSUBSS dest, srcl, src2/mem, src3 
VFNMSUBSS dest, srcl, src2, src3/mem 

and three three-operand forms: 

VFNMSUB132SS srcl, src2, src3/mem 
VFNMSUB213SS srcl, src2, src3/mem 
VFNMSUB231SS srcl, src2, src3/mem 


II dest = -(srcl* src2/mem) - src3 
II desf = -(srcl* src2) - src3/mem 


II srcl = -(srcl* src3/mem) - src2 
II srcl = ~(src2* srcl) - src3/mem 
II srcl = ~(src2* src3/mem) - srcl 


For the four-operand forms, VEX.W determines operand configuration. 

• When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third 
source is a register. 

• When VEX.W = 1, the second source is a register and the third source is either a register or a 32-bit 
memory location. 

For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third 
operand is either a register or a 32-bit memory location. 

The destination is an XMM register. Bits[ 127:32] of the destination XMM register and bits [255:128] 
of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VFNMSUBSS 

FMA4 

CPUID Fn8000_0001_ECX[FMA4] (bit 16) 

VFNMSUBnnnSS 

FMA 

CPUID Fn0000_0001_ECX[FMA] (bit 12) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VFNMSUBSS xmml, xmm2, xmm3/mem32, xmm4 

C4 

RXB.03 

0.src7.X.01 

7E 

/r/is4 

VFNMSUBSS xmml, xmm2, xmm3, xmm4lmem32 

C4 

RXB.03 

1.src7.X.01 

7E 

/r/is4 

VFNMSUB132SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

9F 

/r 

VFNMSUB213SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

AF 

/r 

VFNMSUB231SS xmml, xmm2, xmm3/mem32 

C4 

RXB.02 

0.src2.X.01 

BF 

/r 


Related Instructions 

VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS, 
VFNMSUB132PS, VFNMSUB213PS, VFNMSUB23 IPS, VFNMSUBSD, VFNMSUB 132SD, 
VFNMSUB213SD, VFNMSUB231 SD 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 

M 


M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Instruction Reference 


VFNMSUBSS, VFNMSUBnnnSS 


671 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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VFRCZPD Extract Fraction 

Packed Double-Precision Floating-Point 

Extracts the fractional portion of each double-precision floating-point value of either a source register 
or a memory location and writes the resulting values to the corresponding elements of the destination. 
The fractional results are precise. 

• When XOP.L = 0, the source is either an XMM register or a 128-bit memory location. 

• When XOP.L = 1, the source is a YMM register or 256-bit memory location. 

When the destination is an XMM register, bits [255:128] of the corresponding YMM register are 
cleared. 

Exception conditions are the same as for other arithmetic instructions, except with respect to the sign 
of a zero result. A zero is returned in the following cases: 

• When the operand is a zero. 

• When the operand is a normal integer. 

• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. 

• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. 

In the first three cases, when MXCSR.RC = 01b (round toward - °°) the sign of the zero result is neg¬ 
ative, and is otherwise positive. 

In the fourth case, the operand is its own fractional part, which results in underflow, and the result is 
forced to zero by MXCSR.FZ; the result has the same sign as the operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

VFRCZPD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VFRCZPD xmml, xmm2/mem128 

8F 

RXB.09 

0.1111.0.00 

81 It 

VFRCZPD ymml, ymm2/mem256 

8F 

RXB.09 

0.1111.1.00 

81 It 


Related Instructions 

(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZSS, VFRC- 
ZSD 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 



M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOP.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0. 
See SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



X 

A source operand was an SNaN value. 



X 

Undefined operation. 

Denormalized operand, DE 



X 

A source operand was a denormal value. 

Underflow, UE 



X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



X 

A result could not be represented exactly in the destination format. 

X — XOP exception 
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VFRCZPS Extract Fraction 

Packed Single-Precision Floating-Point 

Extracts the fractional portion of each single-precision floating-point value of either a source register 
or a memory location and writes the resulting values to the corresponding elements of the destination. 
The fractional results are exact. 

• When XOP.L = 0, the source is either an XMM register or a 128-bit memory location. 

• When XOP.L = 1, the source is a YMM register or 256-bit memory location. 

When the destination is an XMM register, bits [255:128] of the corresponding YMM register are 
cleared. 

Exception conditions are the same as for other arithmetic instructions, except with respect to the sign 
of a zero result. A zero is returned in the following cases: 

• When the operand is a zero. 

• When the operand is a normal integer. 

• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. 

• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. 

In the first three cases, when MXCSR.RC = 01b (round toward - °°) the sign of the zero result is neg¬ 
ative, and is otherwise positive. 

In the fourth case, the operand is its own fractional part, which results in underflow, and the result is 
forced to zero by MXCSR.FZ; the result has the same sign as the operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

VFRCZPS 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VFRCZPS xmml, xmm2lmem128 

8F 

RXB.09 

0.1111.0.00 

80 /r 

VFRCZPS ymml, ymm2/mem256 

8F 

RXB.09 

0.1111.1.00 

80 It 


Related Instructions 

(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPD, VFRCZSS, VFRC- 
ZSD 

rFLAGS Affected 

None 


Instruction Reference 


VFRCZPS 


675 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 



M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOP.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0. 
See SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



X 

A source operand was an SNaN value. 



X 

Undefined operation. 

Denormalized operand, DE 



X 

A source operand was a denormal value. 

Underflow, UE 



X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



X 

A result could not be represented exactly in the destination format. 

X — XOP exception 
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VFRCZSD Extract Fraction 

Scalar Double-Precision Floating-Point 

Extracts the fractional portion of the double-precision floating-point value of either the low-order 
quadword of an XMM register or a 64-bit memory location and writes the result to the low-order 
quadword of the destination XMM register. The fractional results are precise. 

When the result is written to the destination XMM register, bits [127:64] of the destination and bits 
[255:128] of the corresponding YMM register are cleared. 

Exception conditions are the same as for other arithmetic instructions, except with respect to the sign 
of a zero result. A zero is returned in the following cases: 

• When the operand is a zero. 

• When the operand is a normal integer. 

• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. 

• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. 

In the first three cases, when MXCSR.RC = 01b (round toward - °°) the sign of the zero result is neg¬ 
ative, and is otherwise positive. 

In the fourth case, the operand is its own fractional part, which results in underflow, and the result is 
forced to zero by MXCSR.FZ; the result has the same sign as the operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

VFRCZSD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 


Encoding 



XOP 

RXB.map_select W.vvvv.L.pp 

Opcode 

VFRCZSD xmml, xmm2/mem64 

Related Instructions 

8F 

RXB.09 0.1111.0.00 

83/r 


(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZPD, VFRC- 
ZSS 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 



M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOP.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0. 
See SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



X 

A source operand was an SNaN value. 



X 

Undefined operation. 

Denormalized operand, DE 



X 

A source operand was a denormal value. 

Underflow, UE 



X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



X 

A result could not be represented exactly in the destination format. 

X — XOP exception 
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VFRCZSS Extract Fraction 

Scalar Single-Precision Floating Point 

Extracts the fractional portion of the single-precision floating-point value of the low-order double- 
word of an XMM register or 32-bit memory location and writes the result in the low-order double- 
word of the destination XMM register. The fractional results are precise. 

When the result is written to the destination XMM register, bits [127:32] of the destination and bits 
[255:128] of the corresponding YMM register are cleared. 

Exception conditions are the same as for other arithmetic instructions, except with respect to the sign 
of a zero result. A zero is returned in the following cases: 

• When the operand is a zero. 

• When the operand is a normal integer. 

• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. 

• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. 

In the first three cases, when MXCSR.RC = 01b (round toward - °°) the sign of the zero result is neg¬ 
ative, and is otherwise positive. 

In the fourth case, the operand is its own fractional part, which results in underflow, and the result is 
forced to zero by MXCSR.FZ; the result has the same sign as the operand. 

Instruction Support 


Form 

Subset 

Feature Flag 

VFRCZSS 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VFRCZSS xmml, xmm2lmem32 

8F RXB.09 

0.1111.0.00 

82/r 


Related Instructions 

ROUNDPD, ROUNDPS, ROUNDSD, ROUNDSS, VFRCZPS, VFRCZPD, VFRCZSD 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 

M 



M 

M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOP.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0. 
See SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



X 

A source operand was an SNaN value. 



X 

Undefined operation. 

Denormalized operand, DE 



X 

A source operand was a denormal value. 

Underflow, UE 



X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



X 

A result could not be represented exactly in the destination format. 

X — XOP exception 
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VGATHERDPD Conditionally Gather Double-Precision 

Floating-Point Values, Doubleword Indices 

Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with dou¬ 
bleword indices. 

The instruction is of the form: 

VGATHERDPD dest, mem64[vm32x], mask 

Loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask operand. If the most-significant bit of the /th element of the mask is set, 
the /th element of the destination is loaded from memory using the /th address of the array of effective 
addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 32-bit values. Quadword elements of the destination 
for which the corresponding mask element is zero are not affected by the operation. If no exceptions 
occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 64-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the two 
low-order doublewords of an XMM register; the two high-order doublewords of the index register are 
not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of 
the YMM register that corresponds to the second source (mask) operand are cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to four 64-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the four dou¬ 
blewords of an XMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VGATHERDPD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VGATHERDPD xmml, vm32x, xmm2 

C4 

RXB.02 

1.src2.0.01 

92 /r 

VGATHERDPD ymml, vm32x, ymm2 

C4 

RXB.02 

1 ,src2. 1.01 

92 /r 


Related Instructions 

VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VGATHERDPS Conditionally Gather Single-Precision 

Floating-Point Values, Doubleword Indices 

Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with dou¬ 
bleword indices. 

The instruction is of the form: 

VGATHERDPS dest, mem32[vm32x/y], mask 

Loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask operand. If the most-significant bit of the /th element of the mask is set, 
the /th element of the destination is loaded from memory using the /th address of the array of effective 
addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 32-bit values. Doubleword elements of the destina¬ 
tion for which the corresponding mask element is zero are not affected by the operation. If no excep¬ 
tions occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to four 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the four dou¬ 
blewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination 
and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are 
cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to eight 32-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the eight dou¬ 
blewords of a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VGATHERDPS 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VGATHERDPS xmml, vm32x, xmm2 

C4 

RXB.02 

0.src2.0.01 

92 It 

VGATHERDPS ymml, vm32y, ymm2 

C4 

RXB.02 

0.src2.1.01 

92 It 


Related Instructions 

VGATHERDPD, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VGATHERQPD Conditionally Gather Double-Precision 

Floating-Point Values, Quadword Indices 

Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with quad- 
word indices. 

The instruction is of the form: 

VGATHERQPD dest, mem64[vm64x/y], mask 

Loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask operand. If the most-significant bit of the /th element of the mask is set, 
the /th element of the destination is loaded from memory using the /th address of the array of effective 
addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 64-bit values. Quadword elements of the destination 
for which the corresponding mask element is zero are not affected by the operation. If no exceptions 
occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 64-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the two 
quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destina¬ 
tion and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are 
cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to four 64-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the four quad- 
words of a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VGATHERQPD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Reference 


VGATHERQPD 


685 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VGATHERQPD xmml, vm64x, xmm2 

C4 

RXB.02 

'I.src2.0.01 

93 /r 

VGATHERQPD ymml, vm64y, ymm2 

C4 

RXB.02 

1 ,src2. 1.01 

93 /r 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VGATHERQPS Conditionally Gather Single-Precision 

Floating-Point Values, Quadword Indices 

Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with quad- 
word indices. 

The instruction is of the form: 

VGATHERQPS dest, mem32[vm64x/y], mask 

Loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask operand. If the most-significant bit of the /th element of the mask is set, 
the /th element of the destination is loaded from memory using the /th address of the array of effective 
addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 64-bit values. Doubleword elements of the destina¬ 
tion for which the corresponding mask element is zero are not affected by the operation. The upper 
half of the destination is zeroed. If no exceptions occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. Only the lower half of the mask 
is used. The index vector is the two quadwords of an XMM register. Bits [255:64] of the YMM regis¬ 
ter that corresponds to the destination and bits [255:64] of the YMM register that corresponds to the 
second source (mask) operand are cleared. 

YMM Encoding 

The destination is an XMM register. The first source operand is up to four 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the four 
quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destina¬ 
tion and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are 
cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VGATHERQPS 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VGATHERQPS xmml, vm64x, xmm2 

C4 

RXB.02 

0.src2.0.01 

93 It 

VGATHERQPS xmml, vm64y, xmm2 

C4 

RXB.02 

0.src2.1.01 

93 /r 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPD, VPGATHERDD, VPGATHERDQ, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VINSERTF128 Insert Packed Floating-Point Values 

128-bit 

Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM regis¬ 
ter or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined 
data to the destination. 

Only bit [0] of the immediate operand is used. Operation is as follows. 

• When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and 
copy bits [127:0] of the second source to bits [127:0] of the destination. 

• When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy 
bits [127:0] of the second source to bits [255:128] of the destination. 

This extended-form instruction has a single 256-bit encoding. 

The first source operand is a YMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a YMM register. There is a third immediate byte oper¬ 
and. 


Instruction Support 


Form 

Subset 

Feature Flag 

VINSERTF128 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VINSERTF128 ymml, ymm2, xmm3/mem128, imm8 C4 RXB.03 O.src.1.01 18/rib 

Related Instructions 

VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTI128 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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VINSERTI128 Insert Packed Integer Values 

128-bit 

Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM regis¬ 
ter or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined 
data to the destination. 

Bit [0] of the immediate operand controls how the 128-bit values from the source operands are 
merged into the destination. The operation is as follows. 

• When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and 
copy bits [127:0] of the second source to bits [127:0] of the destination. 

• When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy 
bits [127:0] of the second source to bits [255:128] of the destination. 

This instruction has a single 256-bit encoding. 

The first source operand is a YMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a YMM register. The immediate byte is encoded in the 
instruction. 


Instruction Support 


Form 

Subset 

Feature Flag 

VINSERTI128 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VINSERTI128 ymml, ymm2, xmm3/mem128, imm8 C4 RXB.03 O.srcll.01 38 /r ib 

Related Instructions 

VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTF128 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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VMASKMOVPD Masked Move 

Packed Double-Precision 

Moves packed double-precision data elements from a source element to a destination element, as 
specified by mask bits in a source operand. There are load and store versions of the instruction. 

For loads, the data elements are in a source memory location; for stores the data elements are in a 
source register. The mask bits are the most-significant bit of the corresponding data element of a 
source register. 

• For loads, when a mask bit = 1, the corresponding data element is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is cleared. 

• For stores, when a mask bit = 1, the corresponding data element is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is not affected. 

Exception and trap behavior for elements not selected for loading or storing from/to memory is 
implementation dependent. For instance, a given implementation may signal a data breakpoint or a 
page fault for quadwords that are zero-masked and not actually written. 

XMM Encoding 

There are load and store encodings. 

• For loads, there are two 64-bit source data elements in a 128-bit memory location, the mask 
operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

• For stores, there are two 64-bit source data elements in an XMM register, the mask operand is an 
XMM register, and the destination is a 128-bit memory location. 

YMM Encoding 

There are load and store encodings. 

• For loads, there are four 64-bit source data elements in a 256-bit memory location, the mask 
operand is a YMM register, and the destination is a YMM register. 

• For stores, there are four 64-bit source data elements in a YMM register, the mask operand is a 
YMM register, and the destination is a 128-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

VMASKMOVPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 




VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

Loads: 






VMASKMOVPD 

xmml, xmm2, mem 128 

04 

RXB.02 

O.srcl.O.OI 

2D /r 

VMASKMOVPD 

ymml, ymm2, mem256 

04 

RXB.02 

O.srclA.OI 

2D It 

Stores: 






VMASKMOVPD 

mem 128, xmml, xmm2 

04 

RXB.02 

O.srcl.O.OI 

2F/r 

VMASKMOVPD 

mem256, ymml, ymm2 

04 

RXB.02 

O.srclA.OI 

2F/r 


Related Instructions 

VMASKMOVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

S 

S 

X 

Write to a read-only data segment. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX exception. 
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VMASKMOVPS Masked Move 

Packed Single-Precision 

Moves packed single-precision data elements from a source element to a destination element, as spec¬ 
ified by mask bits in a source operand. There are load and store versions of the instruction. 

For loads, the data elements are in a source memory location; for stores the data elements are in a 
source register. The mask bits are the most-significant bits of the corresponding data element of a 
source register. 

• For loads, when a mask bit = 1, the corresponding data element is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is cleared. 

• For stores, when a mask bit = 1, the corresponding data element is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is not affected. 

Exception and trap behavior for elements not selected for loading or storing from/to memory is 
implementation dependent. For instance, a given implementation may signal a data breakpoint or a 
page fault for doublewords that are zero-masked and not actually written. 

XMM Encoding 

There are load and store encodings. 

• For loads, there are four 32-bit source data elements in a 128-bit memory location, the mask 
operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM 
register that corresponds to the destination are cleared. 

• For stores, there are four 32-bit source data elements in an XMM register, the mask operand is an 
XMM register, and the destination is a 128-bit memory location. 

YMM Encoding 

There are load and store encodings. 

• For loads, there are eight 32-bit source data elements in a 256-bit memory location, the mask 
operand is a YMM register, and the destination is a YMM register. 

• For stores, there are eight 32-bit source data elements in a YMM register, the mask operand is a 
YMM register, and the destination is a 128-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

VMASKMOVPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

VEX 

Encoding 

RXB.mapselect W.vvvv.L.pp 

Opcode 

Loads: 

VMASKMOVPS xmml, xmm2, mem128 

04 

RXB.02 

O.srclO.OI 

20 It 

VMASKMOVPS ymml, ymm2, mem256 

04 

RXB.02 

0.src7.1.01 

20 It 

Stores: 

VMASKMOVPS mem128, xmml, xmm2 

04 

RXB.02 

O.srclO.OI 

2E/r 

VMASKMOVPS mem256, ymml, ymm2 

04 

RXB.02 

O.srcll.OI 

2E/r 


Related Instructions 

VMASKMOVPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

S 

S 

X 

Write to a read-only data segment. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX exception. 
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VPBLENDD Blend 

Packed Doublewords 

Copies packed doublewords from either of two sources to a destination, as specified by an immediate 
8-bit mask operand. 

Each bit of the mask selects a doubleword from one of the source operands to be copied to the desti¬ 
nation. The least-significant bit controls the selection of the doubleword to be copied to the lowest 
doubleword of the destination. For each doubleword i of the destination: 

• When mask bit [/] = 0, doubleword i of the first source operand is copied to the corresponding 
doubleword of the destination. 

• When mask bit [/] = 1, doubleword i of the second source operand is copied to the corresponding 
doubleword of the destination. 

VPBLENDD 

The instruction has 128-bit and 256-bit encodings. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPBLENDD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPBLENDD xmml, xmm2, xmm3/mem128, imm8 

C4 

RXB.03 

O.srclO.OI 

02 /r /ib 

VPBLENDD ymml, ymm2, ymm3/mem256, imm8 

C4 

RXB.03 

O.srcll.01 

02 /r /ib 


Related Instructions 

VBFENDW 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPBROADCASTB Broadcast Packed Byte 

Loads a byte from a register or memory and writes it to all 16 or 32 bytes of an XMM or YMM regis¬ 
ter. 

This instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Copies the source operand to all 16 bytes of the destination. 

The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location. 
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the des¬ 
tination are cleared. 

YMM Encoding 

Copies the source operand to all 32 bytes of the destination. 

The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location. 
The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPBROADCASTB 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPBROADCASTB xmml, xmm2/mem8 

C4 

RXB.02 

0.1111.0.01 

78 It 

VPBROADCASTB ymml, xmm2/mem8 

C4 

RXB.02 

0.1111.1.01 

78 Ir 


Related Instructions 

VPBROADCASTD, VPBROADCASTQ, VPBROADCASTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPBROADCASTD Broadcast Packed Doubleword 

Loads a doubleword from a register or memory and writes it to all 4 or 8 doublewords of an XMM or 
YMM register. 

This instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Copies the source operand to all 4 doublewords of the destination. 

The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. 
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the des¬ 
tination are cleared. 

YMM Encoding 

Copies the source operand to all 8 doublewords of the destination. 

The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. 
The destination is a YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPBROADCASTD 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPBROADCASTD xmml, xmm2/mem32 

C4 

RXB.02 

0.1111.0.01 

58 /r 

VPBROADCASTD ymml, xmm2/mem32 

C4 

RXB.02 

0.1111.1.01 

58 /r 


Related Instructions 

VPBROADCASTB, VPBROADCASTQ, VPBROADCASTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPBROADCASTQ Broadcast Packed Quadword 

Loads a quadword from a register or memory and writes it to all 2 or 4 quadwords of an XMM or 
YMM register. 

This instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Copies the source operand to both quadwords of the destination. 

The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location. 
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the des¬ 
tination are cleared. 

YMM Encoding 

Copies the source operand to all 4 quadwords of the destination. 

The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location. 
The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPBROADCASTQ 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPBROADCASTQ xmml , xmm2/mem64 

C4 

RXB.02 

0.1111.0.01 

59 /r 

VPBROADCASTQ ymml , xmm2/mem64 

C4 

RXB.02 

0.1111.1.01 

59 It 


Related Instructions 

VPBROADCASTB, VPBROADCASTD, VPBROADCASTW 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPBROADCASTW Broadcast Packed Word 

Loads a word from a register or memory and writes it to all 8 or 16 words of an XMM or YMM reg¬ 
ister. 

This instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

Copies the source operand to all 8 words of the destination. 

The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location. 
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the des¬ 
tination are cleared. 

YMM Encoding 

Copies the source operand to all 16 words of the destination. 

The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location. 
The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPBROADCASTW 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPBROADCASTW xmml, xmm2/mem16 

C4 

RXB.02 

0.1111.0.01 

79 It 

VPBROADCASTW ymml, xmm2/mem16 

C4 

RXB.02 

0.1111.1.01 

79 /r 


Related Instructions 

VPBROADCASTB, VPBROADCASTD, VPBROADCASTQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPCMOV Vector Conditional Move 

Moves bits of either the first source or the second source to the corresponding positions in the destina¬ 
tion, depending on the value of the corresponding bit of a third source. 

When a bit of the third source = 1, the corresponding bit of the first source is moved to the destina¬ 
tion; when a bit of the third source = 0, the corresponding bit of the second source is moved to the 
destination. 

This instruction directly implements the C-language ternary “?” operation on each source bit. 

Arbitrary bit-granular predicates can be constructed by any number of methods, or loaded as con¬ 
stants from memory. This instruction may use the results of any SSE instructions as the predicate in 
the selector. VPCMPEQB (VPCMPGTB), VPCMPEQW (VPCMPGTW), VPCMPEQD (VPCMP- 
GTD) and VPCMPEQQ (VPCMPGTQ) compare bytes, words, doublewords, quadwords and inte¬ 
gers, respectively, and set the predicate in the destination to masks of Is and Os accordingly. 
VCMPPS (VCMPSS) and VCMPPD (VCMPSD) compare word and doubleword floating-point 
source values, respectively, and provide the predicate for the floating-point instructions. 

There are four operands: VPCMOV dest, srcl, src2, src3. 

The first source (srcl) is an XMM or YMM register specified by XOP.vvvv. 

XOPW and bits [7:4] of an immediate byte (, imm.8 ) configure src2 and srcl : 

• When XOPW = 0, src2 is either a register or a memory location specified by ModRM.r/m and src3 
is a register specified by imm8[7:4], 

• When XOPW = 1, src2 is a register specified by imm.8[7:4] and src3 is either a register or a 
memory location specified by ModRM.r/m. 

The destination (dest) is either an XMM or a YMM register, as determined by XOPL. When the des¬ 
tination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCMOV 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Encoding 



XOP 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPCMOV xmml, xmm2, xmm3lmem128, xmm4 

8F 

RXB.08 

O.srcl.O.OO 

A2 It ib 

VPCMOV ymml, ymm2, ymm3lmem256, ymm4 

8F 

RXB.08 

0. srcl .1.00 

A2 It ib 

VPCMOV xmml, xmm2, xmm3, xmm4lmem128 

8F 

RXB.08 

l.srclO.OO 

A2 /r ib 

VPCMOV ymml, ymm2, ymm3, ymm4lmem256 

8F 

RXB.08 

1.src7.1.00 

A2 It ib 


Related Instructions 

VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMUW, VCMPPD, VCMPPS 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMB Compare Vector 

Signed Bytes 

Compares corresponding packed signed bytes in the first and second sources and writes the result of 
each comparison in the corresponding byte of the destination. The result of each comparison is an 8- 
bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMB dest, srcl, src2, imm8 

The destination (dest) is an XMM registers specified by ModRM.reg. When the comparison results 
are written to the destination XMM register, bits [255:128] of the corresponding YMM register are 
cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of the immediate-byte operand (imm.8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTB 

001 

Less Than or Equal 

VPCOMLEB 

010 

Greater Than 

VPCOMGTB 

011 

Greater Than or Equal 

VPCOMGEB 

100 

Equal 

VPCOMEQB 

101 

Not Equal 

VPCOMNEQB 

110 

False 

VPCOMFALSEB 

111 

True 

VPCOMTRUEB 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMB 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPCOMB xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 O.srclO.OO CC/rib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMW, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMD Compare Vector 

Signed Doublewords 

Compares corresponding packed signed doublewords in the first and second sources and writes the 
result of each comparison to the corresponding doubleword of the destination. The result of each 
comparison is a 32-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMD dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the results of the compar¬ 
isons are written to the destination XMM register, bits [255:128] of the corresponding YMM register 
are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm.8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTD 

001 

Less Than or Equal 

VPCOMLED 

010 

Greater Than 

VPCOMGTD 

011 

Greater Than or Equal 

VPCOMGED 

100 

Equal 

VPCOMEQD 

101 

Not Equal 

VPCOMNEQD 

110 

False 

VPCOMFALSED 

111 

True 

VPCOMTRUED 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPCOMD xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 O.srclO.OO CE/r ib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMQ Compare Vector 

Signed Quadwords 

Compares corresponding packed signed quadwords in the first and second sources and writes the 
result of each comparison to the corresponding quadword of the destination. The result of each com¬ 
parison is a 64-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMQ dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the 
destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTQ 

001 

Less Than or Equal 

VPCOMLEQ 

010 

Greater Than 

VPCOMGTQ 

011 

Greater Than or Equal 

VPCOMGEQ 

100 

Equal 

VPCOMEQQ 

101 

Not Equal 

VPCOMNEQQ 

110 

False 

VPCOMFALSEQ 

111 

True 

VPCOMTRUEQ 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMQ 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPCOMQ xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 O.srclO.OO CF/r ib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMUB Compare Vector 

Unsigned Bytes 

Compares corresponding packed unsigned bytes in the first and second sources and writes the result 
of each comparison to the corresponding byte of the destination. The result of each comparison is an 
8-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMUB dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the 
destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTUB 

001 

Less Than or Equal 

VPCOMLEUB 

010 

Greater Than 

VPCOMGTUB 

011 

Greater Than or Equal 

VPCOMGEUB 

100 

Equal 

VPCOMEQUB 

101 

Not Equal 

VPCOMNEQUB 

110 

False 

VPCOMFALSEUB 

111 

True 

VPCOMTRUEUB 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMUB 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPCOMUB xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 O.srclO.OO EC/rib 

Related Instructions 

VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMUD Compare Vector 

Unsigned Doublewords 

Compares corresponding packed unsigned doublewords in the first and second sources and writes the 
result of each comparison to the corresponding doubleword of the destination. The result of each 
comparison is a 32-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMUD dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to 
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTUD 

001 

Less Than or Equal 

VPCOMLEUD 

010 

Greater Than 

VPCOMGTUD 

011 

Greater Than or Equal 

VPCOMGEUD 

100 

Equal 

VPCOMEQUD 

101 

Not Equal 

VPCOMNEQUD 

110 

False 

VPCOMFALSEUD 

111 

True 

VPCOMTRUEUD 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMUD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPCOMUD xmml, xmm2, xmm3imem128, imm8 8F RXB.08 0. srcl. 0.00 EE /r ib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 


Instruction Reference 


VPCOMUD 


717 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMUQ Compare Vector 

Unsigned Quadwords 

Compares corresponding packed unsigned quadwords in the first and second sources and writes the 
result of each comparison to the corresponding quadword of the destination. The result of each com¬ 
parison is a 64-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMUQ dest, srcl, src2, imm.8 

The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to 
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTUQ 

001 

Less Than or Equal 

VPCOMLEUQ 

010 

Greater Than 

VPCOMGTUQ 

011 

Greater Than or Equal 

VPCOMGEUQ 

100 

Equal 

VPCOMEQUQ 

101 

Not Equal 

VPCOMNEQUQ 

110 

False 

VPCOMFALSEUQ 

111 

True 

VPCOMTRUEUQ 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMUQ 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPCOMUQ xmml, xmm2, xmm3lmem128, imm8 8F RXB.08 O.srclO.OO EF/rib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUD, VPCOMB, VPCOMW, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMUW Compare Vector 

Unsigned Words 

Compares corresponding packed unsigned words in the first and second sources and writes the result 
of each comparison to the corresponding word of the destination. The result of each comparison is a 
16-bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMUW dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to 
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand ( imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTUW 

001 

Less Than or Equal 

VPCOMLEUW 

010 

Greater Than 

VPCOMGTUW 

011 

Greater Than or Equal 

VPCOMGEUW 

100 

Equal 

VPCOMEQUW 

101 

Not Equal 

VPCOMNEQUW 

110 

False 

VPCOMFALSEUW 

111 

True 

VPCOMTRUEUW 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMUW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPCOMUW xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 0.srcl.0.00 ED /r ib 

Related Instructions 

VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPCOMW Compare Vector 

Signed Words 

Compares corresponding packed signed words in the first and second sources and writes the result of 
each comparison in the corresponding word of the destination. The result of each comparison is a 16- 
bit value of all Is (TRUE) or all Os (FALSE). 

There are four operands: VPCOMW dest, srcl, src2, imm8 

The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to 
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field and the second source 
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has 
an alias mnemonic to facilitate coding. 


imm8[2:0] 

Comparison 

Mnemonic 

000 

Less Than 

VPCOMLTW 

001 

Less Than or Equal 

VPCOMLEW 

010 

Greater Than 

VPCOMGTW 

011 

Greater Than or Equal 

VPCOMGEW 

100 

Equal 

VPCOMEQW 

101 

Not Equal 

VPCOMNEQW 

110 

False 

VPCOMFALSEW 

111 

True 

VPCOMTRUEW 


Instruction Support 


Form 

Subset 

Feature Flag 

VPCOMW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPCOMW xmml, xmm2, xmm3/mem128, imm8 8F RXB.08 O.srcl.O.OO CD/r ib 

Related Instructions 

VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMD, VPCOMQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPERM2F128 Permute Floating-Point 

128-bit 


Copies 128 bits of floating-point data from a selected octword of two 256-bit source operands or zero 
to each octword of a 256-bit destination, as specified by an immediate byte operand. 

The immediate operand is encoded as follows. 


Destination 

Immediate-Byte 
Bit Field 

Value of 

Bit Field 

Source 1 

Bits Copied 

Source 2 

Bits Copied 

[127:0] 

[1:0] 

00 

[127:0] 

— 

01 

[255:128] 

— 

10 

— 

[127:0] 

11 

— 

[255:128] 

Setting imm8 [3] clears bits [127:0] of the destination; i mm8 [2] is ignored. 

[255:128] 

[5:4] 

00 

[127:0] 

— 

01 

[255:128] 

— 

10 

— 

[127:0] 

11 

— 

[255:128] 

Setting imm8 [7] clears bits [255:128] of the destination; i mm8 [6] is ignored. 


This is a 256-bit extended-form instruction: 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERM2F128 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPERM2F128 ymml, ymm2, ymm3/mem256, imm8 C4 RXB.03 O.srcf.1.01 06/rib 

Related Instructions 

VEXTRACTF128, VINSERTF128, VPERMILPD, VPERMILPS 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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VPERM2I128 Permute Integer 

128-bit 


Copies 128 bits of integer data from a selected octword of two 256-bit source operands or zero to 
each octword of a 256-bit destination, as specified by an immediate byte operand. 

The immediate operand is encoded as follows. 


Destination 

Immediate-Byte 
Bit Field 

Value of 

Bit Field 

Source 1 

Bits Copied 

Source 2 

Bits Copied 

[127:0] 

[1:0] 

00 

[127:0] 

— 

01 

[255:128] 

— 

10 

— 

[127:0] 

11 

— 

[255:128] 

Setting imm8 [3] clears bits [127:0] of the destination; imm8 [2] is ignored. 

[255:128] 

[5:4] 

00 

[127:0] 

— 

01 

[255:128] 

— 

10 

— 

[127:0] 

11 

— 

[255:128] 

Setting imm8 [7] clears bits [255:128] of the destination; imm8 [6] is ignored. 


This is a 256-bit extended-form instruction: 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. Bits 2 and 6 of the immediate 
byte are ignored. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPERM2I128 

AVX2 

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPERM21128 ymml, ymm2, ymm3/mem256, imm8 C4 RXB.03 O.srcl A ,0'\ 46/rib 

Related Instructions 

VEXTRACTI128, VEXTRACTF128, VINSERTI128, VINSERTF128, VPERMILPD, VPERMILPS 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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VPERMD Packed Permute Doubleword 

Copies selected doublewords from a 256-bit value located either in memory or a YMM register to 
specific doublewords of the destination YMM register. For each doubleword of the destination, selec¬ 
tion of which doubleword to copy from the source is specified by a selector field in the corresponding 
doubleword of a YMM register. 

There is a single form of this instruction: 

VPERMD dest, srcl, src2 

The first source operand provides eight 3-bit selectors, each selector occupying the least-significant 
bits of a doubleword. Each selector specifies the index of the doubleword of the second source oper¬ 
and to be copied to the destination. The doubleword in the destination that each selector controls is 
based on its position within the first source operand. 

The index value may be the same in multiple selectors. This results in multiple copies of the same 
source doubleword being copied to the destination. 

There is no 128-bit form of this instruction. 

YMM Encoding 

The destination is a YMM register. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERMD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


Instruction Encoding 

Mnemonic 

VPERMD ymml, ymm2, ymm3/mem256 

Related Instructions 

VPERMQ, VPERMPD, VPERMPS 

rFLAGS Affected 

None 


VEX 

C4 


Encoding 


RXB.map_select 

RXB.02 


W.vvvv.L.pp 

0.src7.1.01 


Opcode 

36 /r 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

A 

A 

A 

CRO.EM = 1. 

A 

A 

A 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L= 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPERMIL2PD Permute Two-Source 

Double-Precision Floating-Point 

Copies a selected quadword from one of two source operands to a selected quadword of the destina¬ 
tion or clears the selected quadword of the destination. Values in a third source operand and an imme¬ 
diate two-bit operand control the operation. 

There are 128-bit and 256-bit versions of this instruction. Both versions have five operands: 
VPERMIL2PD dest, srcl, src2, src3, m2z. 

The first four operands are either 128 bits or 256 bits wide, as detennined by VEX.L. When the desti¬ 
nation is an XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The third source operand is a selector that specifies how quadwords are copied or cleared in the desti¬ 
nation. The selector contains one selector element for each quadword of the destination register. 


Selector for 128-bit Instruction Form 

127 64 63 0 



The selector for the 128-bit instruction form is an octword composed of two quadword selector ele¬ 
ments SO and SI. SO (the lower quadword) controls the value written to destination quadword 0 (bits 
[63:0]) and SI (the upper quadword) controls the destination quadword 1 (bits [127:64]). 


Selector for 256-bit Instruction Form 

255 192 191 128 



The selector for the 256-bit instruction form is a double octword and adds two more selector elements 

52 and S3. SO controls the value written to the destination quadword 0 (bits [63:0]), SI controls the 
destination quadword 1 (bits [127:64]), S2 controls the destination quadword 2 (bits [191:128]), and 

53 controls the destination quadword 3 (bits [255:192]). 

The layout of each selector element is as follows: 


63 4 3 2 1 0 



The fields are defined as follows: 
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• Sel — Select. Selects the source quadword to copy into the corresponding quadword of the 
destination: 


Sel Value 

Source Selected for Destination 
Quadwords 0 and 1 (both forms) 

Source Selected for Destination 
Quadwords 2 and 3 (256-bit form) 

00b 

src1[ 63:0] 

src1[ 191:128] 

01b 

srct[127:64] 

srcl [255:192] 

10b 

src2[63:0] 

src2[ 191:128] 

11b 

src2[ 127:64] 

src2[255:192] 


• M — Match bit. The combination of the Match bit in each selector element and the value of the 
M2Z field determines if the Select field is overridden. This is described below. 


m2z immediate operand 

The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruc¬ 
tion. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one 
of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synop¬ 
sis by the symbol “is5”. 

The immediate byte is defined as follows. 


7 4 3 2 1 0 


SRS 


M2Z 


Bits 

Mnemonic 

Description 

[7:4] 

SRS 

Source Register Select 

[3:2] 

— 

Reserved, IGN 

[1:0] 

M2Z 

Match to Zero 


Fields are defined as follows: 

• SRS — Source Register Select. As with many other extended instructions, bits in the immediate 
byte are used to select a source operand register. This field is set by the assembler based on the 
operands listed in the instruction. See discussion in “ src2 and src3 Operand Addressing” below. 

• M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the 
function of the Sel field as follows: 


M2Z Field 

Selector M Bit 

Value Loaded into Destination Quadword 

OXb 

X 

Source quadword selected by selector element Sel field. 

10b 

0 

Source quadword selected by selector element Sel field. 

10b 

1 

Zero 

11b 

0 

Zero 

11b 

1 

Source quadword selected by selector element Sel field. 


src2 and src3 Operand Addressing 

In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3\ 
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• When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and 
src3 is a register specified by bits [7:4] of the immediate byte. 

• When VEX.W = 1, s rc2 is a register specified by bits [7:4] of the immediate byte and src3 is either 
a register or a memory location specified by ModRM.r/m. 

In non-64-bit mode, bit 7 is ignored. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPERMIL2PD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Encoding 


Mnemonic 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPERMIL2PD xmml, xmm2, xmm3/mem128, xmm4, 

m2z 

C4 

RXB.03 

O.srcl.0.01 

49 

/ r 

is5 

VPERMIL2PD xmml, xmm2, xmm3, xmm4/mem128, 

m2z 

C4 

RXB.03 

l.srcV.0.01 

49 

It 

is5 

VPERMIL2PD ymml, ymm2, ymm3/mem256, ymm4, 

m2z 

C4 

RXB.03 

O.srcll.01 

49 

/r 

is5 

VPERMIL2PD ymml, ymm2, ymm3, ymm4/mem256, 

m2z 

C4 

RXB.03 

l.srct.1.01 

49 

It 

is5 


NOTE: VPERMIL2PD is encoded using the VEX prefix even though it is an XOP instruction. 

Related Instructions 

VPERM2F128, VPERMIL2PS, VPERMILPD, VPERMILPS, VPPERM 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPERMIL2PS Permute Two-Source 

Single-Precision Floating-Point 

Copies a selected doubleword from one of two source operands to a selected doubleword of the desti¬ 
nation or clears the selected doubleword of the destination. Values in a third source operand and an 
immediate two-bit operand control operation. 

There are 128-bit and 256-bit versions of this instruction. Both versions have five operands: 
VPERMIL2PS dest, srcl, src2, src3, m2z 

The first four operands are either 128 bits or 256 bits wide, as detennined by VEX.L. When the desti¬ 
nation is an XMM register, bits [255:128] of the corresponding YMM register are cleared. 

The third source operand is a selector that specifies how doublewords are copied or cleared in the des¬ 
tination. The selector contains one selector element for each doubleword of the destination register. 

Selector for 128-bit Instruction Form 

127 96 95 64 63 32 31 0 


S3 

S2 

SI 

SO 


The selector for the 128-bit instruction form is an octword containing four selector elements S0-S3. 
SO controls the value written to the destination doubleword 0 (bits [31:0]), S1 controls the destination 
doubleword 1 (bits [63:32]), S2 controls the destination doubleword 2 (bits [95:64]), and S3 controls 
the destination doubleword 3 (bits [127:96]). 


Selector for 256-bit Instruction Form 

255 224 223 192 191 160 159 128 


S7 

S6 

S5 

S4 

127 96 95 64 63 32 31 0 

S3 

S2 

SI 

SO 


The selector for the 256-bit instruction form is a double octword and adds four more selector ele¬ 
ments S4-S7. S4 controls the value written to the destination doubleword 4 (bits [159:128]), S5 con¬ 
trols the destination doubleword 5 (bits [191:160]), S6 controls the destination doubleword 6 (bits 
[223:192]), and S7 controls the destination doubleword 7 (bits [255:224]). 

The layout of each selector element is as follows. 


31 


4 3 2 1 0 


Reserved, IGN 


M 


Sel 


Bits 

Mnemonic 

Description 

[31:4] 

— 

Reserved, IGN 

[3] 

M 

Match 

[2:0] 

Sel 

Select 


The fields are defined as follows: 
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• Sel — Select. Selects the source doubleword to copy into the corresponding doubleword of the 
destination: 


Sel Value 

Source Selected for Destination 
Doublewords 0, 1, 2 and 3 (both forms) 

Source Selected for Destination 
Doublewords 4, 5, 6 and 7 (256-bit form) 

000b 

src1[ 31:0] 

srct[159:128] 

001b 

srcl [63:32] 

src1[191:160] 

010b 

srcl [95:64] 

src1[ 223:192] 

011b 

srcfil 27:96] 

src1[ 255:224] 

100b 

src2[31:0] 

src2[ 159:128] 

101b 

src2[ 63:32] 

src2[191:160] 

110b 

src2[95:64] 

src2[ 223:192] 

111b 

src2[127:96] 

src2[ 255:224] 


• M — Match. The combination of the M bit in each selector element and the value of the M2Z field 
determines if the Sel field is overridden. This is described below. 

m2z immediate operand 

The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruc¬ 
tion. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one 
of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synop¬ 
sis by the symbol “is5”. 

The immediate byte is defined as follows. 


7 4 3 2 1 0 



Bits 

Mnemonic 

Description 

[7:4] 

SRS 

Source Register Select 

[3:2] 

— 

Reserved, IGN 

[1:0] 

M2Z 

Match to Zero 


Fields are defined as follows: 

• SRS — Source Register Select. As with many other extended instructions, bits in the immediate 
byte are used to select a source operand register. This field is set by the assembler based on the 
operands listed in the instruction. See discussion in “ src2 and src3 Operand Addressing” below. 

• M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the 
function of the Sel field as follows: 


M2Z Field 

Selector M Bit 

Value Loaded into Destination Doubleword 

OXb 

X 

Source doubleword selected by Sel field. 

10b 

0 

Source doubleword selected by Sel field. 
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M2Z Field 

Selector M Bit 

Value Loaded into Destination Doubleword 

10b 

1 

Zero 

11b 

0 

Zero 

11b 

1 

Source doubleword selected by Sel field. 


src2 and src3 Operand Addressing 

In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3 : 

• When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and 
src3 is a register specified by bits [7:4] of the immediate byte. 

• When VEX.W = 1, s rc2 is a register specified by bits [7:4] of the immediate byte and src3 is either 
a register or a memory location specified by ModRM.r/m. 

In non-64-bit mode, bit 7 is ignored. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERMIL2PS 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Encoding 


Mnemonic 


VEX 

RXB.map select W.vvvv.L.pp 

Opcode 

VPERMIL2PS xmml, xmm2, xmm3lmem128, xmm4, 

m2z 

C4 

RXB.03 

O.srcf.O.OI 

48 

/ r 

is5 

VPERMIL2PS xmml, xmm2, xmm3, xmm4lmem128, 

m2z 

C4 

RXB.03 

l.srclO.OI 

48 

It 

is5 

VPERMIL2PS ymml, ymm2, ymm3lmem256, ymm4, 

m2z 

C4 

RXB.03 

O.srcf.1.01 

48 

Ir 

is5 

VPERMIL2PS ymml, ymm2, ymm3, ymm4lmem256, 

m2z 

C4 

RXB.03 

/ \.src1A.0 / \ 

48 

/r 

is5 


NOTE: VPERMIL2PS is encoded using the VEX prefix even though it is an XOP instruction. 

Related Instructions 

VPERM2F128, VPERMIL2PD, VPERMILPD, VPERMILPS, VPPERM 

rFLAGS Affected 

None 


MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPERMILPD Permute 

Double-Precision 

Copies double-precision floating-point values from a source to a destination. Source and destination 
can be selected in two ways. There are different encodings for each selection method. 

Selection by bits in a source register or memory location: 

Each quadword of the operand is defined as follows. 

63 2 1 0 

Sel 


A bit selects source and destination. Only bit [1] is used; bits [63:2} and bit [0] are ignored. Setting 
the bit selects the corresponding quadword element of the source and the destination. 

Selection by bits in an immediate byte: 

Each bit corresponds to a destination quadword. Only bits [3:2] and bits [1:0] are used; bits [7:4] are 
ignored. Selections are defined as follows. 


Destination 

Quadword 

Immediate-Byte 
Bit Field 

Value of 

Bit Field 

Source 1 

Bits Copied 

Used by 128-bit encoding and 256-bit encoding 

[63:0] 

[0] 

0 

[63:0] 

1 

[127:64] 

[127:64] 

[1] 

0 

[63:0] 

1 

[127:64] 

Used only by 256-bit encoding 

[191:128] 

[2] 

0 

[191:128] 

1 

[255:192] 

[255:192] 

[3] 

0 

[191:128] 

1 

[255:192] 


This extended-form instruction has both 128-bit and 256-bit encoding. 

XMM Encoding 

There are two encodings, one for each selection method: 

• The first source operand is an XMM register. The second source operand is either an XMM 
register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

• The first source operand is either an XMM register or a 128-bit memory location. The destination 
is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register 
that corresponds to the destination are cleared. 

YMM Encoding 

There are two encodings, one for each selection method: 
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• The first source operand is a YMM register. The second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

• The first source operand is either a YMM register or a 256-bit memory location. The destination is 
a YMM register. There is a third, immediate byte operand. 


instruction Support 


Form 

Subset 

Feature Flag 

VPERMILPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

Selection by source register or memory: 

VPERMILPD xmml, xmm2, xmm3lmem128 

C4 

RXB.02 

O.srclO.OI 

0D /r 

VPERMILPD ymml, ymm2, ymm3lmem256 

C4 

RXB.02 

0.src7.1.01 

0D /r 

Selection by immediate byte operand: 

VPERMILPD xmml, xmm2/mem128, imm8 

C4 

RXB.03 

0.1111.0.01 

05/rib 

VPERMILPD ymml, ymm2/mem256, imm8 

C4 

RXB.03 

0.1111.1.01 

05/rib 


Related Instructions 

VPERM2F128, VPERMIL2PD, VPERMIL2PS, VPERMILPS, VPPERM 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b (for versions with immediate byte operand only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPERMILPS Permute 

Single-Precision 

Copies single-precision floating-point values from a source to a destination. Source and destination 
can be selected in two ways. There are different encodings for each selection method. 

Selection by bit fields in a source register or memory location: 

Each doubleword of the operand is defined as follows. 

31 2 10 

Sei 


Each bit field corresponds to a destination doubleword. Bit values select a source doubleword. Only 
bits [1:0] of each word are used; bits [31:2} are ignored. The 128-bit encoding uses four two-bit 
fields; the 256-bit version uses eight two-bit fields. Field encoding is as follows. 


Destination 

Doubleword 

Immediate Operand 
Bit Field 

Value of 

Bit Field 

Source 

Bits Copied 

[31:0] 

[1:0] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[63:32] 

[33:32] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[95:64] 

[65:64] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[127:96] 

[97:96] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 
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Destination 

Doubleword 

Immediate Operand 
Bit Field 

Value of 

Bit Field 

Source 

Bits Copied 

Upper 128 bits of 256-bit source and destination used by 256-bit encoding 

[159:128] 

[129:128] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[191:160] 

[161:160] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[223:192] 

[193:192] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[255:224] 

[225:224] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 


Selection by bit fields in an immediate byte: 

Each bit field corresponds to a destination doubleword. For the 256-bit encoding, the fields specify 
sources and destinations in both the upper and lower 128 bits of the register. Selections are defined as 
follows. 


Destination 

Doubleword 

Bit Field 

Value of Bit 
Field 

Source 

Bits Copied 

[31:0] 

[1:0] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[63:32] 

[3:2] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[95:64] 

[5:4] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 

[127:96] 

[7:6] 

00 

[31:0] 

01 

[63:32] 

10 

[95:64] 

11 

[127:96] 
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Destination 

Doubleword 

Bit Field 

Value of Bit 
Field 

Source 

Bits Copied 

Upper 128 bits of 256-bit source and destination used by 256-bit encoding 

[159:128] 

[1:0] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[191:160] 

[3:2] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[223:192] 

[5:4] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 

[255:224] 

[7:6] 

00 

[159:128] 

01 

[191:160] 

10 

[223:192] 

11 

[255:224] 


This extended-form instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

There are two encodings, one for each selection method: 

• The first source operand is an XMM register. The second source operand is either an XMM 
register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

• The first source operand is either an XMM register or a 128-bit memory location. The destination 
is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register 
that corresponds to the destination are cleared. 

YMM Encoding 

There are two encodings, one for each selection method: 

• The first source operand is a YMM register. The second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 

• The first source operand is either a YMM register or a 256-bit memory location. The destination is 
a YMM register. There is a third, immediate byte operand. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPERMILPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic 

Selection by source register or memory: 

VPERMILPS xmml, xmm2, xmm3/mem128 
VPERMILPS ymml, ymm2, ymm3/mem256 
Selection by immediate byte operand: 
VPERMILPS xmml , xmm2/mem128, imm8 
VPERMILPS ymml, ymm2l mem256, imm8 

Related Instructions 

VPERM2F128, VPERMIL2PD, VPERMIL2PS, 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Encoding 


VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

C4 

RXB.02 

0.src7.0.01 

0C /r 

C4 

RXB.02 

O.srcll.01 

0C /r 

C4 

RXB.03 

0.1111.0.01 

04 /r ib 

C4 

RXB.03 

0.1111.1.01 

04 /r ib 


VPERMILPD, VPPERM 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b (for versions with immediate byte operand only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CR0.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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VPERMPD Packed Permute 

Double-Precision Floating-Point 

Copies selected quadwords from a 256-bit value located either in memory or a YMM register to spe¬ 
cific quadwords of the destination. For each quadword of the destination, selection of which quad- 
word to copy from the source is specified by a 2 bit selector field in an immediate byte. 

There is a single form of this instruction: 

VPERMPD dest, src, imm8 

The selection of which quadword of the source operand to copy to each quadword of the destination 
is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the 
quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to 
quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quad- 
word to be copied to quadword 3. 

The index value may be the same in multiple selectors. This results in multiple copies of the same 
source quadword being copied to the destination. 

There is no 128-bit fonn of this instruction. 

YMM Encoding 

The destination is a YMM register. The source operand is a YMM register or a 256-bit memory loca¬ 
tion. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERMPD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.03 1.1111.1.01 01/rib 

Related Instructions 

VPERMD, VPERMQ, VPERMPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Mnemonic 

VPERMPD ymml, ymm2/mem256, imm8 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

A 

A 

A 

CRO.EM = 1. 

A 

A 

A 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L= 0. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPERMPS Packed Permute 

Single-Precision Floating-Point 

Copies selected doublewords from a 256-bit value located either in memory or a YMM register to 
specific doublewords of the destination YMM register. For each doubleword of the destination, selec¬ 
tion of which doubleword to copy from the source is specified by a selector field in the corresponding 
doubleword of a YMM register. 

There is a single form of this instruction: 

VPERMPS dest, srcl, src2 

The first source operand provides eight 3-bit selectors, each selector occupying the least-significant 
bits of a doubleword. Each selector specifies the index of the doubleword of the second source oper¬ 
and to be copied to the destination. The doubleword in the destination that each selector controls is 
based on its position within the first source operand. 

The index value may be the same in multiple selectors. This results in multiple copies of the same 
source doubleword being copied to the destination. 

There is no 128-bit form of this instruction. 

YMM Encoding 

The destination is a YMM register. The first source operand is a YMM register and the second source 
operand is either a YMM register or a 256-bit memory location. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERMPS 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

C4 RXB.02 O.srct.I.OI 16/r 

Related Instructions 

VPERMD, VPERMQ, VPERMPD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Mnemonic 

VPERMPS ymml, ymm2, ymm3/mem256 


748 


VPERMPS 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


A 

A 

A 

CRO.EM = 1. 


A 

A 

A 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 




A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 




A 

VEX.L= 0. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPERMQ Packed Permute Quadword 

Copies selected quadwords from a 256-bit value located either in memory or a YMM register to spe¬ 
cific quadwords of the destination. For each quadword of the destination, selection of which quad- 
word to copy from the source is specified by a 2 bit selector field in an immediate byte. 

There is a single form of this instruction: 

VPERMQ dest, src, imm8 

The selection of which quadword of the source operand to copy to each quadword of the destination 
is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the 
quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to 
quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quad- 
word to be copied to quadword 3. 

The index value may be the same in multiple selectors. This results in multiple copies of the same 
source quadword being copied to the destination. 

There is no 128-bit fonn of this instruction. 

YMM Encoding 

The destination is a YMM register. The source operand is a YMM register or a 256-bit memory loca¬ 
tion. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPERMQ 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Encoding 

Mnemonic VEX RXB.mapselect W.vvvv.L.pp Opcode 

VPERMQ ymml, ymm2/mem256, imm8 C4 RXB.03 1.1111.1.01 00/rib 

Related Instructions 

VPERMD, VPERMPD, VPERMPS 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

A 

A 

A 

CRO.EM = 1. 

A 

A 

A 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L= 0. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPGATHERDD Conditionally Gather Doublewords, 

Doubleword Indices 

Conditionally loads doubleword values from memory using VSIB addressing with doubleword indi¬ 
ces. 

The instruction is of the form: 

VPGATHERDD dest, mem32[vm32x/y], mask 

The loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask (second source operand). If the most-significant bit of the zth element 
of the mask is set, the zth element of the destination is loaded from memory using the zth address of 
the array of effective addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 32-bit values. Doubleword elements of the destina¬ 
tion for which the corresponding mask element is zero are not affected by the operation. If no excep¬ 
tions occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to four 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the four dou¬ 
blewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination 
and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are 
cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to eight 32-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the eight dou¬ 
blewords of a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPGATHERDD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPGATHERDD xmml, vm32x, xmm2 

C4 

RXB.02 

0.src2.0.01 

90 It 

VPGATHERDD ymml, vm32y, ymm2 

C4 

RXB.02 

0.src2.1.01 

90 It 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDQ, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CR0.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPGATHERDQ Conditionally Gather Quadwords, 

Doubleword Indices 

Conditionally loads quadword values from memory using VSIB addressing with doubleword indices. 
The instruction is of the form: 

VPGATHERDQ dest, mem64[vm32x], mask 

The loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask (second source operand). If the most-significant bit of the /th element 
of the mask is set, the /th element of the destination is loaded from memory using the /th address of 
the array of effective addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 32-bit values. Quadword elements of the destination 
for which the corresponding mask element is zero are not affected by the operation. If no exceptions 
occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 64-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the two 
low-order doublewords of an XMM register; the two high-order doublewords of the index register are 
not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of 
the YMM register that corresponds to the second source (mask) operand are cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to four 64-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the four dou¬ 
blewords of an XMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPGATHERDQ 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPGATHERDQ xmml, vm32x, xmm2 

C4 

RXB.02 

1.src2.0.01 

90 /r 

VPGATHERDQ ymml, vm32x, ymm2 

C4 

RXB.02 

1 ,src2. 1.01 

90 /r 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATH- 
ERQD, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CR0.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPGATHERQD Conditionally Gather Doublewords, 

Quadword Indices 

Conditionally loads doubleword values from memory using VSIB addressing with quadword indices. 
The instruction is of the form: 

VPGATHERQD dest, mem32[vm64x/y], mask 

The loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask (second source operand). If the most-significant bit of the /th element 
of the mask is set, the /th element of the destination is loaded from memory using the /th address of 
the array of effective addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 64-bit values. Doubleword elements of the destina¬ 
tion for which the corresponding mask element is zero are not affected by the operation. If no excep¬ 
tions occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the two 
quadwords of an XMM register. The upper half of the destination register and the mask register are 
cleared. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of 
the YMM register that corresponds to the mask register are cleared. 

YMM Encoding 

The destination is an XMM register. The first source operand is up to four 32-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the four 
quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destina¬ 
tion and bits [255:128] of the YMM register that corresponds to the mask register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPGATHERQD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPGATHERQD xmml, vm64x, xmm2 

C4 

RXB.02 

0.src2.0.01 

91 It 

VPGATHERQD xmml, vm64y, xmm2 

C4 

RXB.02 

0.src2.1.01 

91 It 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATH- 
ERDQ, VPGATHERQQ 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPGATHERQQ Conditionally Gather Quadwords, 

Quadword Indices 

Conditionally loads quadword values from memory using VSIB addressing with quadword indices. 
The instruction is of the form: 

VPGATHERQQ dest, mem64[vm64x/y], mask 

The loading of each element of the destination register is conditional based on the value of the corre¬ 
sponding element of the mask (second source operand). If the most-significant bit of the /th element 
of the mask is set, the /th element of the destination is loaded from memory using the /th address of 
the array of effective addresses calculated using VSIB addressing. 

The index register is treated as an array of signed 64-bit values. Quadword elements of the destination 
for which the corresponding mask element is zero are not affected by the operation. If no exceptions 
occur, the mask register is set to zero. 

Execution of the instruction can be suspended by an exception if the exception is triggered by an ele¬ 
ment other than the rightmost element loaded. When this happens, the destination register and the 
mask operand may be observed as partially updated. Elements that have been loaded will have their 
mask elements set to zero. If any traps or faults are pending from elements that have been loaded, 
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction 
breakpoint is not re-triggered when the instruction execution is resumed. 

See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. 

There are 128-bit and 256-bit fonns of this instruction. 

XMM Encoding 

The destination is an XMM register. The first source operand is up to two 64-bit values located in 
memory. The second source operand (the mask) is an XMM register. The index vector is the two 
quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destina¬ 
tion and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are 
cleared. 

YMM Encoding 

The destination is a YMM register. The first source operand is up to four 64-bit values located in 
memory. The second source operand (the mask) is a YMM register. The index vector is the four quad- 
words of a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPGATHERQQ 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPGATHERQQ xmml, vm64x, xmm2 

C4 

RXB.02 

1.src2.0.01 

91 /r 

VPGATHERQQ ymml, vm64y, ymm2 

C4 

RXB.02 

1 ,src2. 1.01 

91 /r 


Related Instructions 

VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATH- 
ERDQ, VPGATHERQD 

rFLAGS Affected 

RF 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPU ID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPU ID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPHADDBD Packed Horizontal Add 

Signed Byte to Signed Doubleword 

Adds four sets of four 8-bit signed integer values of the source and packs the sign-extended sums into 
the corresponding doubleword of the destination. 

There are two operands: VPHADDBD dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDBD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPHADDBD xmml, xmm2lmem128 

8F RXB.09 

0.1111.0.00 

C2 It 


Related Instructions 

VPHADDBW, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDBQ Packed Horizontal Add 

Signed Byte to Signed Quadword 

Adds two sets of eight 8-bit signed integer values of the source and packs the sign-extended sums into 
the corresponding quadword of the destination. 

There are two operands: VPHADDBQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDBQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPHADDBQ xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

C3/r 


Related Instructions 

VPHADDBW, VPHADDBD, VPHADDWD, VPHADDWQ, VPHADDDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDBW Packed Horizontal Add 

Signed Byte to Signed Word 

Adds each adjacent pair of 8-bit signed integer values of the source and packs the sign-extended 16- 
bit integer result of each addition into the corresponding word element of the destination. 

There are two operands: VPHADDBW dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDBW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDBW xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

Cl It 


Related Instructions 

VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDDQ Packed Horizontal Add 

Signed Doubleword to Signed Quadword 

Adds each adjacent pair of signed doubleword integer values of the source and packs the sign- 
extended sums into the corresponding quadword of the destination. 

There are two operands: VPHADDDQ dest, src 

The source is either an XMM register or a 128-bit memory location and the destination is an XMM 
register. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDDQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 



Mnemonic 

Encoding 



XOP RXB.mapselect W.vvvv.L.pp 

Opcode 

VPHADDDQ xmml, xmm2/mem128 

Related Instructions 

8F RXB.09 0.1111.0.00 

CB/r 


VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUBD Packed Horizontal Add 

Unsigned Byte to Doubleword 

Adds four sets of four 8-bit unsigned integer values of the source and packs the sums into the corre¬ 
sponding doublewords of the destination. 

There are two operands: VPHADDUBD dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUBD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPHADDUBD xmml, xmm2lmem128 8F RXB.09 0.1111.0.00 D2/r 

Related Instructions 

VPHADDUBW, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUBQ Packed Horizontal Add 

Unsigned Byte to Quadword 

Adds two sets of eight 8-bit unsigned integer values from the second source and packs the sums into 
the corresponding quadword of the destination. 

There are two operands: VPHADDUBQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. When the destination XMM register is written, bits [255:128] of the corresponding YMM 
register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUBQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPHADDUBQ xmml, xmm2/mem128 8F RXB.09 0.1111.0.00 D3/r 

Related Instructions 

VPHADDUBW, VPHADDUBD, VPHADDUWD, VPHADDUWQ, VPHADDUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUBW Packed Horizontal Add 

Unsigned Byte to Word 

Adds each adjacent pair of 8-bit unsigned integer values of the source and packs the 16-bit integer 
sums to the corresponding word of the destination. 

There are two operands: VPHADDUBW dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUBW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 


Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPHADDUBW xmml, xmm2lmem128 8F RXB.09 0.1111.0.00 D1/r 

Related Instructions 

VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUDQ Packed Horizontal Add 

Unsigned Doubleword to Quadword 

Adds two adjacent pairs of 32-bit unsigned integer values of the source and packs the sums into the 
corresponding quadword of the destination. 

There are two operands: VPHADDUDQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUDQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDUDQ xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

DB/r 


Related Instructions 

VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUWD Packed Horizontal Add 

Unsigned Word to Doubleword 

Adds four adjacent pairs of 16-bit unsigned integer values of the source and packs the sums into the 
corresponding doubleword of the destination. 

There are two operands: VPHADDUWD dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDUWD xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

D6/r 


Related Instructions 

VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWQ, VPHADDUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDUWQ Packed Horizontal Add 

Unsigned Word to Quadword 

Adds two pairs of 16-bit unsigned integer values of the source and packs the sums into the corre¬ 
sponding quadword element of the destination. 

There are two operands: VPHADDUWQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDUWQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDUWQ xmml, xmm2lmem128 

8F RXB.09 

0.1111.0.00 

D7/r 


Related Instructions 

VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDWD Packed Horizontal Add 

Signed Word to Signed Doubleword 

Adds four adjacent pairs of 16-bit signed integer values of the source and packs the sign-extended 
sums to the corresponding doubleword of the destination. 

There are two operands: VPHADDWD dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDWD xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

C6 It 


Related Instructions 

VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWQ, VPHADDDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHADDWQ Packed Horizontal Add 

Signed Word to Signed Quadword 

Adds four successive pairs of 16-bit signed integer values of the source and packs the sign-extended 
sums to the corresponding quadword of the destination. 

There are two operands: VPHADDWQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the corresponding YMM register are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHADDWQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHADDWQ xmml, xmm2lmem128 

8F RXB.09 

0.1111.0.00 

C7 It 


Related Instructions 

VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHSUBBW Packed Horizontal Subtract 

Signed Byte to Signed Word 

Subtracts the most significant signed integer byte from the least significant signed integer byte of 
each word element in the source and packs the sign-extended 16-bit integer differences into the desti¬ 
nation. 

There are two operands: VPHSUBBW dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. When the destination is written, bits [255:128] of the corresponding YMM register are 
cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHSUBBW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHSUBBW xmml, xmm2lmem128 

8F RXB.09 

0.1111.0.00 

El /r 


Related Instructions 

VPHSUBWD, VPHSUBDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHSUBDQ Packed Horizontal Subtract 

Signed Doubleword to Signed Quadword 

Subtracts the most significant signed integer doubleword from the least significant signed integer 
doubleword of each quadword in the source and packs the sign-extended 64-bit integer differences 
into the corresponding quadword element of the destination. 

There are two operands: VPHSUBDQ dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. When the destination is written, bits [255:128] of the corresponding YMM register are 
cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHSUBDQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHSUBDQ xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

E3/r 


Related Instructions 

VPHSUBBW, VPHSUBWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPHSUBWD Packed Horizontal Subtract 

Signed Word to Signed Doubleword 

Subtracts the most significant signed integer word from the least significant signed integer word of 
each doubleword of the source and packs the sign-extended 32-bit integer differences into the destina¬ 
tion. 

There are two operands: VPHSUBWD dest, src 

The destination is an XMM register and the source is either an XMM register or a 128-bit memory 
location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPHSUBWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

Encoding 



XOP RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPHSUBWD xmml, xmm2/mem128 

8F RXB.09 

0.1111.0.00 

E2 /r 


Related Instructions 

VPHSUBBW, VPHSUBDQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSDD Packed Multiply Accumulate 

Signed Doubleword to Signed Doubleword 

Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of 
the second source, adds the corresponding value of the third source to the 64-bit signed integer prod¬ 
uct, and writes four 32-bit sums to the destination. 

No saturation is performed on the sum. When the result of the multiplication causes non-zero values 
to be set in the upper 32 bits of the 64-bit product, they are ignored. When the result of the add over¬ 
flows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only 
the signed low-order 32 bits of the result are written to the destination. 

There are four operands: VPMACSDD dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, 
bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source (src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When the third source designates the same XMM register as the destination, the XMM register 
behaves as an accumulator. 

instruction Support 


Form 

Subset 

Feature Flag 

VPMACSDD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPMACSDD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 9E/r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 


Instruction Reference 


VPMACSDD 


791 






AMD J 

AMD64 Technology 26568 — Rev. 3.23—February 2019 


VPMACSDQH Packed Multiply Accumulate 

Signed High Doubleword to Signed Quadword 

Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the 
second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit 
signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first 
source by the fourth 32-bit signed integer value of the second source, then adds the high-order 64-bit 
signed integer value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to 
the destination. 

No saturation is performed on the sum. When the result of the add overflows, the carry is ignored 
(neither the overflow nor carry bit in rFLAGS is set). 

There are four operands: VPMACSDQH dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, 
bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field; the second source (src2) 
is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the 
third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When the third source designates the same XMM register as the destination, the XMM register 
behaves as an accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSDQH 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSDQH xmml, xmm2, xmm3/mem128, xmm4 8F RXB.01000 0.srcl.0.00 9F /r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSDQL Packed Multiply Accumulate 

Signed Low Doubleword to Signed Quadword 

Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit 
signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first 
source by the corresponding value of the second source, then adds the high-order 64-bit signed inte¬ 
ger value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to the desti¬ 
nation register. 

No saturation is performed on the sum. When the result of the add overflows, the carry is ignored 
(neither the overflow nor carry bit in rFLAGS is set). Only the low-order 64 bits of each result are 
written to the destination. 

There are four operands: VPMACSDQL dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination is a YMM register specified by ModRM.reg. When the destination is written, bits 
[255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source ( src2 ) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source ( src3 ) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

instruction Support 


Form 

Subset 

Feature Flag 

VPMACSDQL 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSDQL xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 97 /r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQL, VPMACSSDQH, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSSDD Packed Multiply Accumulate with Saturation 

Signed Doubleword to Signed Doubleword 

Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the corresponding packed 32-bit signed integer value of the third source 
to each 64-bit signed integer product. Writes four saturated 32-bit sums to the destination. 

Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated 
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated 
to 8000_0000h. 

There are four operands: VPMACSSDD dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, 
bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source (src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 


instruction Support 


Form 

Subset 

Feature Flag 

VPMACSSDD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPMACSSDD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 8E /r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSDD, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSSDQH Packed Multiply Accumulate with Saturation 

Signed High Doubleword to Signed Quadword 

Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the 
second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit 
signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first 
source by the corresponding value of the second source, then adds the high-order 64-bit signed inte¬ 
ger value of the third source to the 64-bit signed integer product. Writes two saturated sums to the 
destination. 

Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated 
to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it 
is saturated to 8000_0000_0000_0000h. 

There are four operands: VPMACSSDQH dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination {dest) is an XMM register specified by ModRM.reg. When the destination XMM reg¬ 
ister is written, bits [255:128] of the corresponding YMM register are cleared. 

The first source {srcl) is an XMM register specified by XOP.vvvv; the second source {src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source {src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSSDQH 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSSDQH xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 8F/rib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQL, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSSDQL Packed Multiply Accumulate with Saturation 

Signed Low Doubleword to Signed Quadword 

Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit 
signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first 
source by the third 32-bit signed integer value of the second source, then adds the high-order 64-bit 
signed integer value of the third source to the 64-bit signed integer product. Writes two saturated 
sums to the destination. 

Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated 
to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it 
is saturated to 8000_0000_0000_0000h. 

There are four operands: VPMACSSDQL dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination (dest) register is an XMM register specified by ModRM.reg. When the destination is 
written, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source (src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSSDQL 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSSDQL xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 87/r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSSWD Packed Multiply Accumulate with Saturation 

Signed Word to Signed Doubleword 

Multiplies the odd-numbered packed 16-bit signed integer values of the first source by the corre¬ 
sponding values of the second source, then adds the corresponding packed 32-bit signed integer val¬ 
ues of the third source to the 32-bit signed integer products. Writes four saturated sums to the 
destination. 

Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated 
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated 
to 8000_0000h. 

There are four operands: 

VPMACSSWD d est, srcl, src2, src3 dest = srcl* src2 + src3 

The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM reg¬ 
ister is written, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by the XOP.vvvv field; the second source (src2) 
is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the 
third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSSWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSSWD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srclO.OO 86/rib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSSWW Packed Multiply Accumulate with Saturation 

Signed Word to Signed Word 

Multiplies each packed 16-bit signed integer value of the first source by the corresponding packed 16- 
bit signed integer value of the second source, then adds the corresponding packed 16-bit signed inte¬ 
ger value of the third source to the 32-bit signed integer products. Writes eight saturated sums to the 
destination. 

Out of range results of the addition are saturated to fit into a signed 16-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 16-bit integer, it is saturated 
to 7FFFh, and when the value is smaller than the smallest signed 16-bit integer, it is saturated to 
8000h. 

There are four operands: 

VPMACSSWW dest, srcl, src2, src3 dest = srcl* src2 + src3 

The destination is an XMM register specified by ModRM.reg. When the destination is written, bits 
[255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source ( src2 ) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source ( src3 ) is an XMM register specified by bits [7:4] of an immediate byte. 

When src3 and dest designate the same XMM register, this register behaves as an accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSSWW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPMACSSWW xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.srcl.0.00 85 /r ib 

Related Instructions 

VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL,VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSWD Packed Multiply Accumulate 

Signed Word to Signed Doubleword 

Multiplies each odd-numbered packed 16-bit signed integer value of the first source by the corre¬ 
sponding value of the second source, then adds the corresponding packed 32-bit signed integer value 
of the third source to the 32-bit signed integer products. Writes four 32-bit results to the destination. 

When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in 
rFLAGS is set). Only the low-order 32 bits of the result are written to the destination. 

There are four operands: VPMACSWD dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination (dest) register is an XMM register specified by ModRM.reg. When the destination 
XMM register is written, bits [255:128] of the corresponding YMM register are cleared. 

The first source (srcl) is an XMM register specified by XOP.vvvv; the second source (src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPMACSWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMACSWD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.srcl.0.00 96/r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSSDD, VPMACSDO, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMACSWW Packed Multiply Accumulate 

Signed Word to Signed Word 

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the corresponding packed 16-bit signed integer value of the third source 
to each 32-bit signed integer product. Writes eight 16-bit results to the destination. 

No saturation is performed on the sum. When the result of the multiplication causes non-zero values 
to be set in the upper 16 bits of the 32 bit result, they are ignored. When the result of the add over¬ 
flows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only 
the signed low-order 16 bits of the result are written to the destination. 

There are four operands: VPMACSWW dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination {dest) is an XMM register specified by ModRM.reg. When the destination XMM reg¬ 
ister is written, bits [255:128] of the corresponding YMM register are cleared. 

The first source {srcl) is an XMM register specified by XOP.vvvv; the second source {src2) is either 
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third 
source {src3) is an XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

instruction Support 


Form 

Subset 

Feature Flag 

VPMACSWW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

VPMACSWW xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.srcl.0.00 95/rib 

Related Instructions 

VPMACSSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, 
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMADCSSWD Packed Multiply Add Accumulate 

with Saturation 
Signed Word to Signed Doubleword 

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the 32-bit signed integer products of the even-odd adjacent words. Each 
resulting sum is then added to the corresponding packed 32-bit signed integer value of the third 
source. Writes four 32-bit signed-integer results to the destination. 

Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed 
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated 
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated 
to 8000_0000h. 

There are four operands: VPMADCSSWD dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination is an XMM register specified by ModRM.reg. When the destination is written, bits 
[255:128] of the corresponding YMM register are cleared. 

The first source is an XMM register specified by XOP.vvvv; the second source is either an XMM reg¬ 
ister or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an 
XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPMADCSSWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

VPMADCSSWD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srcl.O.OO A6/r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMADCSWD Packed Multiply Add Accumulate 

Signed Word to Signed Doubleword 

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of 
the second source, then adds the 32-bit signed integer products of the even-odd adjacent words 
together and adds the sums to the corresponding packed 32-bit signed integer values of the third 
source. Writes four 32-bit sums to the destination. 

No saturation is performed on the sum. When the result of the addition overflows, the carry is ignored 
(neither the overflow nor carry bit in rFLAGS is set). Only the signed 32-bits of the result are written 
to the destination. 

There are four operands: VPMADCSWD dest, srcl, src2, src3 dest = srcl * src2 + src3 

The destination is an XMM register specified by ModRM.reg. When the destination is written, bits 
[255:128] of the corresponding YMM register are cleared. 

The first source is an XMM register specified by XOP.vvvv, the second source is either an XMM reg¬ 
ister or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an 
XMM register specified by bits [7:4] of an immediate byte operand. 

When src3 designates the same XMM register as the dest register, the XMM register behaves as an 
accumulator. 

instruction Support 


Form 

Subset 

Feature Flag 

PMADCSWD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Encoding 

XOP RXB.mapselect W.vvvv.L.pp Opcode 

PMADCSWD xmml, xmm2, xmm3/mem128, xmm4 8F RXB.08 O.srcl.O.OO B6 /r ib 

Related Instructions 

VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, 
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPW = 1. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPMASKMOVD Masked Move 

Packed Doubleword 

Moves packed doublewords from a second source operand to a destination, as specified by mask bits 

in a first source operand. There are load and store versions of the instruction. 

The mask bits are the most-significant bit of each doubleword in the first source operand (mask). 

• For loads, when a mask bit = 1, the corresponding doubleword is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is cleared. 

• For stores, when a mask bit = 1, the corresponding doubleword is copied from the source to the 
same element of the destination; when a mask bit = 0, the corresponding element of the destination 
is not affected. 

Exception and trap behavior for elements not selected for loading or storing from/to memory is 

implementation dependent. For instance, a given implementation may signal a data breakpoint or a 

page fault for doublewords that are zero-masked and not actually written. 

This instruction provides no non-temporal access hint. 

This instruction has both 128-bit and 256-bit forms: 

XMM Encoding 

There are load and store encodings. 

• For loads, the four doublewords that make up the source operand are located in a 128-bit memory 
location, the mask operand is an XMM register, and the destination is an XMM register. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

• For stores, the four doublewords that make up the source operand are located in an XMM register, 
the mask operand is an XMM register, and the destination is a 128-bit memory location. 

YMM Encoding 

There are load and store encodings. 

• For loads, the eight doublewords that make up the source operand are located in a 256-bit memory 
location, the mask operand is a YMM register, and the destination is a YMM register. 

• For stores, the eight doublewords that make up the source operand are located in a YMM register, 
the mask operand is a YMM register, and the destination is a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPMASKMOVD 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 


Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

Loads: 

VPMASKMOVD xmml, xmm2, mem128 

04 

RXB.02 

O.srclO.OI 

80 It 

VPMASKMOVD ymml, ymm2, mem256 

04 

RXB.02 

O.srcll.01 

80 /r 

Stores: 

VPMASKMOVD mem128, xmml, xmm2 

04 

RXB.02 

O.srclO.OI 

8E/r 

VPMASKMOVD mem256, ymml, ymm2 

04 

RXB.02 

O.srcll.01 

8E It 


Related Instructions 

VPMASKMO V Q 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 


Instruction Reference 


VPMASKMO VD 


815 






AMD J 

AMD64 Technology 26568 — Rev. 3.23—February 2019 


VPMASKMOVQ Masked Move 

Packed Quadword 

Moves packed quadwords from a second source operand to a destination, as specified by mask bits in 

a first source operand. There are load and store versions of the instruction. 

The mask bits are the most-significant bit of each quadword in the mask first source operand (mask). 

• For loads, when a mask bit = 1, the corresponding quadword is copied from the source to the same 
element of the destination; when a mask bit = 0, the corresponding element of the destination is 
cleared. 

• For stores, when a mask bit = 1, the corresponding quadword is copied from the source to the same 
element of the destination; when a mask bit = 0, the corresponding element of the destination is not 
affected. 

Exception and trap behavior for elements not selected for loading or storing from/to memory is 

implementation dependent. For instance, a given implementation may signal a data breakpoint or a 

page fault for quadwords that are zero-masked and not actually written. 

This instruction provides no non-temporal access hint. 

This instruction has both 128-bit and 256-bit forms: 

XMM Encoding 

There are load and store encodings. 

• For loads, the two quadwords that make up the source operand are located in a 128-bit memory 
location, the mask operand is an XMM register, and the destination is an XMM register. Bits 
[255:128] of the YMM register that corresponds to the destination are cleared. 

• For stores, the two quadwords that make up the source operand are located in an XMM register, the 
mask operand is an XMM register, and the destination is a 128-bit memory location. 

YMM Encoding 

There are load and store encodings. 

• For loads, the four quadwords that make up the source operand are located in a 256-bit memory 
location, the mask operand is a YMM register, and the destination is a YMM register. 

• For stores, the four quadwords that make up the source operand are located in a YMM register, the 
mask operand is a YMM register, and the destination is a 256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPMASKMOVQ 

AVX2 

Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 
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Instruction Encoding 

Mnemonic 

Loads: 

VPMASKMOVQ xmml, xmm2, mem128 
VPMASKMOVQ ymml, ymm2, mem256 

Stores: 

VPMASKMOVQ mem128, xmml, xmm2 
VPMASKMOVQ mem256, ymml, ymm2 

Related Instructions 

VPMASKMOVD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 


AMD64 Technology 


Encoding 


VEX 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

C4 

RXB.02 

l.srclO.OI 

8C /r 

C4 

RXB.02 

l.srcll.01 

8C /r 

C4 

RXB.02 

l.srclO.OI 

8E/r 

C4 

RXB.02 

l.srcll.01 

8E /r 
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VPPERM Packed Permute 

Bytes 

Selects 16 of 32 packed bytes from two concatenated sources, applies a logical transformation to each 
selected byte, then writes the byte to a specified position in the destination. 

There are four operands: VPPERM dest, srcl, src2, src3 

The second {src2) and first {srcl) sources are concatenated to form the 32-byte source. 

The srcl operand is an XMM register specified by XOP.vvvv. 

The third source ( src3 ) contains 16 control bytes. Each control byte specifies the source byte and the 
logical operation to perfonn on that byte. The order of the bytes in the destination is the same as that 
of the control bytes in the src3. 

For each byte of the 16-byte result, the corresponding src3 byte is used as follows: 

• Bits [7:5] select a logical operation to perform on the selected byte. 


Bit Value 

Selected Operation 

000 

Source byte (no logical operation) 

001 

Invert source byte 

010 

Bit reverse of source byte 

011 

Bit reverse of inverted source byte 

100 

OOh (zero-fill) 

101 

FFh (ones-fill) 

110 

Most significant bit of source byte replicated in all bit positions. 

111 

Invert most significant bit of source byte and replicate in all bit positions. 


• Bits [4:0] select a source byte to move from src2:srcl. 


Bit 

Value 

Source 

Byte 

Bit 

Value 

Source 

Byte 

Bit 

Value 

Source 

Byte 

Bit 

Value 

Source 

Byte 

00000 

srcl [7:0] 

01000 

srcl [7 1:64] 

10000 

src2[7:0] 

11000 

src2[ 71:64] 

00001 

src1[ 15:8] 

01001 

srcl [79:72] 

10001 

src2[15:8] 

11001 

src2[ 79:72] 

00010 

src1[23:16] 

01010 

srcl [87:80] 

10010 

src2[23:16] 

11010 

src2[87:80] 

00011 

srcl [31:24] 

01011 

srcl [95:88] 

10011 

src2[31:24] 

11011 

src2[95:88] 

00100 

srcl [39:32] 

01100 

srcltf 03:96] 

10100 

src2[ 39:32] 

11100 

src2[103:96] 

00101 

src1[47:40] 

01101 

src7[111:104] 

10101 

src2[47:40] 

11101 

src2[111:104] 

00110 

src1[55:48] 

OHIO 

src7[119:112] 

10110 

src2[55:48] 

11110 

src2[119:112] 

00111 

srcl [63:56] 

01111 

srcf[127:120] 

10111 

src2[63:56] 

11111 

src2[127:120] 


XOPW and an immediate byte ( imm8 ) determine register configuration. 

• When XOPW = 0, src2 is either an XMM register or a 128-bit memory location specified by 
ModRM.r/m and src3 is an XMM register specified by imm8[7:4]. 
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• When XOP.W = 1, src2 is an XMM register specified by imm8[7:4] and src3 is either an XMM 
register or a 128-bit memory location specified by ModRM.r/m. 

The destination ( dest ) is an XMM register specified by ModRM.reg. When the result is written to the 
dest XMM register, bits [255:128] of the corresponding YMM register are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPPERM 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VPPERM xmml, xmm2, xmm3lmem128, xmm4 
VPPERM xmml, xmm2, xmm3, xmm4lmem128 

Related Instructions 

VPSHUFHW, VPSHUFD, VPSHUFLW, VPSHUFW, VPERMIL2PS, VPERMIL2PD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 


Encoding 

XOP RXB.map_select W.vvvv.L.pp Opcode 

8F RXB.08 O.srcl.O.OO A3 /r ib 

8F RXB.08 l.srct.O.OO A3/r ib 
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VPROTB Packed Rotate 

Bytes 

Rotates each byte of the source as specified by a count operand and writes the result to the corre¬ 
sponding byte of the destination. 

There are two versions of the instruction, one for each source of the count byte: 

• VPROTB dest, src,fixed-count 

• VPROTB dest, src, variable-count 

For both versions of the instruction, the destination (dest) operand is an XMM register specified by 
ModRM.reg. 

The fixed-count version of the instruction rotates each byte of the source (src) the number of bits spec¬ 
ified by the immediate fixed-count byte. All bytes are rotated the same amount. The source XMM 
register or memory location is selected by the ModRM.r/m field. 

The variable-count version of the instruction rotates each byte of the source the amount specified in 
the corresponding byte element of the variable-count. Both src and variable-count are configured by 
XOPW. 

• When XOPW = 0, variable-count is an XMM register specified by XOP.vvvv and src is either an 
XMM register or a 128-bit memory location specified by ModRM.r/m. 

• When XOPW = 1, variable-count is either an XMM register or a 128-bit memory location 
specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

When the count value is positive, bits are rotated to the left (toward the more significant bit posi¬ 
tions). The bits rotated out left of the most significant bit are rotated back in at the right end (least-sig¬ 
nificant bit) of the byte. 

When the count value is negative, bits are rotated to the right (toward the least significant bit posi¬ 
tions). The bits rotated to the right out of the least significant bit are rotated back in at the left end 
(most-significant bit) of the byte. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPROTB 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Encoding 


VPROTB xmml, xmm2lmem128, xmm3 

XOP 

8F 

RXB.map_select 

RXB.09 

W.vvvv.L.pp 

O.count.O.OO 

Opcode 

90 It 

VPROTB xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

1 .src.0.00 

90 It 

VPROTB xmml, xmm2/mem128, imm8 

8F 

RXB.08 

0.1111.0.00 

CO /r ib 
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Related Instructions 

VPROTW, VPROTD, VPROTQ,VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.vvvv ! = 1111b (for immediate operand variant only) 



X 

XOPL field = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPROTD Packed Rotate 

Doublewords 

Rotates each doubleword of the source as specified by a count operand and writes the result to the 
corresponding doubleword of the destination. 

There are two versions of the instruction, one for each source of the count byte: 

• VPROTD dest, src, fixed-count 

• VPROTD dest, src, variable-count 

For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. 

The fixed count version of the instruction rotates each doubleword of the source operand the number 
of bits specified by the immediate fixed-count byte operand. All doublewords are rotated the same 
amount. The src XMM register or memory location is selected by the ModRM.r/m field. 

The variable count version of the instruction rotates each doubleword of the source by the amount 
specified in the low order byte of the corresponding doubleword of the variable-count operand vector. 

Both src and variable-count are configured by XOP.W. 

• When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by the 
ModRM.r/m field and variable-count is an XMM register specified by XOP.vvvv. 

• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an 
XMM register or a 128-bit memory location specified by the ModRM.r/m field. 

When the count value is positive, bits are rotated to the left (toward the more significant bit posi¬ 
tions). The bits rotated out to the left of the most significant bit of each source doubleword operand 
are rotated back in at the right end (least-significant bit) of the doubleword. 

When the count value is negative, bits are rotated to the right (toward the least significant bit posi¬ 
tions). The bits rotated to the right out of the least significant bit of each source doubleword operand 
are rotated back in at the left end (most-significant bit) of the doubleword. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPROTD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPROTD xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

92 It 

VPROTD xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.src.0.00 

92 It 

VPROTD xmml, xmm2lmem128, imm8 

8F 

RXB.08 

0.1111.0.00 

C2 /r ib 
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Related Instructions 

VPROTB, VPROTW, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.vvvv ! = 1111b (for immediate operand variant only) 



X 

XOPL field = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPROTQ Packed Rotate 

Quadwords 

Rotates each quadword of the source operand as specified by a count operand and writes the result to 
the corresponding quadword of the destination. 

There are two versions of the instruction, one for each source of the count byte: 

• VPROT Q dest, src, fixed-count 

• VPROTQ dest, src, variable-count 

For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. 

The fixed count version of the instruction rotates each quadword in the source the number of bits 
specified by the immediate fixed-count byte operand. All quadword elements of the source are rotated 
the same amount. The src XMM register or memory location is selected by the ModRM.r/m field. 

The variable count version of the instruction rotates each quadword of the source the amount speci¬ 
fied ny the low order byte of the corresponding quadword of the variable-count operand. 

Both src and variable-count are configured by XOP.W. 

• When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by 
ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv. 

• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an 
XMM register or a 128-bit memory location specified by ModRM.r/m. 

When the count value is positive, bits are rotated to the left (toward the more significant bit positions) 
of the operand element. The bits rotated out to the left of the most significant bit of the word element 
are rotated back in at the right end (least-significant bit). 

When the count value is negative, operand element bits are rotated to the right (toward the least sig¬ 
nificant bit positions). The bits rotated to the right out of the least significant bit are rotated back in at 
the left end (most-significant bit) of the word element. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPROTQ 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPROTQ xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

93 It 

VPROTQ xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.src.0.00 

93 /r 

VPROTQ xmml, xmm2lmem128, imm8 

8F 

RXB.08 

0.1111.0.00 

C3 It ib 


824 


VPROTQ 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.vvvv ! = 1111b (for immediate operand variant only) 



X 

XOPL field = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPROTW Packed Rotate 

Words 

Rotates each word of the source as specified by a count operand and writes the result to the corre¬ 
sponding word of the destination. 

There are two versions of the instruction, one for each source of the count byte: 

• VPROTW dest, src, fixed-count 

• VPROTW dest, src, variable-count 

For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. 

The fixed count version of the instruction rotates each word of the source the number of bits specified 
by the immediate fixed-count byte operand. All words of the source operand are rotated the same 
amount. The src XMM register or memory location is selected by the ModRM.r/m field. 

The variable count version of this instruction rotates each word of the source operand by the amount 
specified in the low order byte of the corresponding word of the variable-count operand. 

Both src and variable-count are configured by XOP.W. 

• When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by 
ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv. 

• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an 
XMM register or a 128-bit memory location specified by ModRM.r/m. 

When the count value is positive, bits are rotated to the left (toward the more significant bit posi¬ 
tions). The bits rotated out to the left of the most significant bit of an element are rotated back in at the 
right end (least-significant bit) of the word element. 

When the count value is negative, bits are rotated to the right (toward the least significant bit posi¬ 
tions) of the element. The bits rotated to the right out of the least significant bit of an element are 
rotated back in at the left end (most-significant bit) of the word element. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPROTW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPROTW xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

91 /r 

VPROTW xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.sre.0.00 

91 /r 

VPROTW xmml, xmm2lmem128, imm8 

8F 

RXB.08 

0.1111.0.00 

Cl /rib 
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Related Instructions 

VPROTB, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.vvvv ! = 1111b (for immediate operand variant only) 



X 

XOPL field = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHAB Packed Shift Arithmetic 

Bytes 

Shifts each signed byte of the source as specified by a count byte and writes the result to the corre¬ 
sponding byte of the destination. 

The count bytes are 8-bit signed two's-complement values in the corresponding bytes of the count 
operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the byte. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant 
bit) of the byte. 

There are three operands: VPSHAB dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or al28-bit memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a 128-bit memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHAB 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen- 


dix E of Volume 3. 

Instruction Encoding 

Mnemonic 

XOP 

Encoding 

RXB.mapselect W.vvvv.L.pp 

Opcode 

VPSHAB xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

98 It 

VPSHAB xmml, xmm2, xmm3lmem128 

Related Instructions 

8F 

RXB.09 

l.src.0.00 

98 /r 


VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHAD Packed Shift Arithmetic 

Doublewords 

Shifts each signed doubleword of the source operand as specified by a count byte and writes the result 
to the corresponding doubleword of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding doubleword of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the doubleword. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant 
bit) of the doubleword. 

There are three operands: VPSHAD dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHAD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSHAD xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

9A It 

VPSHAD xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.sre.0.00 

9A It 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, 
VPSHAW, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHAQ Packed Shift Arithmetic 

Quadwords 

Shifts each signed quadword of the source as specified by a count byte and writes the result to the cor¬ 
responding quadword of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding quadword element of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the quadword. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). The most significant bit is replicated and shifted in at the left end (most-significant bit) of the 
quadword. 

The shift amount is stored in two’s-complement form. The count is modulo 64. 

There are three operands: VPSHAQ dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHAQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSHAQ xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

9B /r 

VPSHAQ xmml, xmm2, xmm3/mem128 

8F 

RXB.09 

l.src.0.00 

9B /r 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, 
VPSHAW, VPSHAD 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHAW Packed Shift Arithmetic 

Words 

Shifts each signed word of the source as specified by a count byte and writes the result to the corre¬ 
sponding word of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding word of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the word. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). The most significant bit (signed bit) is replicated and shifted in at the left end (most-significant 
bit) of the word. 

The shift amount is stored in two’s-complement form. The count is modulo 16. 

There are three operands: VPSHAW dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHAW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSHAW xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

99 It 

VPSHAW xmml, xmm2, xmm3imem128 

8F 

RXB.09 

l.sre.0.00 

99 /r 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHLB Packed Shift Logical 

Bytes 

Shifts each packed byte of the source as specified by a count byte and writes the result to the corre¬ 
sponding byte of the destination. 

The count bytes are 8-bit signed two's-complement values located in the corresponding byte element 
of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the byte. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). Zeros are shifted in at the left end (most-significant bit) of the byte. 

There are three operands: VPSHLB dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHLB 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSHLB xmml, xmm2lmem128, xmm3 

8F 

RXB.09 

O.count.O.OO 

94 It 

VPSHLB xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.sre.0.00 

94 It 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHLD Packed Shift Logical 

Doublewords 

Shifts each doubleword of the source operand as specified by a count byte and writes the result to the 
corresponding doubleword of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding doubleword element of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the doubleword. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). Zeros are shifted in at the left end (most-significant bit) of the doubleword. 

The shift amount is stored in two’s-complement form. The count is modulo 32. 

There are three operands: VPSHLD dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHLD 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSHLD xmml, xmm3lmem128, xmm2 

8F 

RXB.09 

O.count.O.OO 

96 It 

VPSHLD xmml, xmm2, xmm3lmem128 

8F 

RXB.09 

l.src.0.00 

96 It 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 


838 


VPSHLD 


Instruction Reference 





26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHLQ Packed Shift Logical 

Quadwords 

Shifts each quadwords of the source by as specified by a count byte and writes the result in the corre¬ 
sponding quadword of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding quadword element of the count operand. 

Bit 6 of the count byte is ignored. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the quadword. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). Zeros are shifted in at the left end (most-significant bit) of the quadword. 

There are three operands: VPSHLQ dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHLQ 

XOP 

CPUID Fn8000_0001_ECX[XQP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 



XOP 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VPSHLQ xmml, xmm3/mem128, xmm2 

8F 

RXB.09 

O.count.O.OO 

97 It 

VPSHLQ xmml, xmm2, xmm3/mem128 

8F 

RXB.09 

l.src.0.00 

97 /r 


Related Instructions 

VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSHLW Packed Shift Logical 

Words 

Shifts each word of the source operand as specified by a count byte and writes the result to the corre¬ 
sponding word of the destination. 

The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corre¬ 
sponding word element of the count operand. 

When the count value is positive, bits are shifted to the left (toward the more significant bit positions). 
Zeros are shifted in at the right end (least-significant bit) of the word. 

When the count value is negative, bits are shifted to the right (toward the least significant bit posi¬ 
tions). Zeros are shifted in at the left end (most-significant bit) of the word. 

There are three operands: VPSHLW dest, src, count 

The destination (dest) is an XMM register specified by ModRM.reg. 

Both src and count are configured by XOP.W. 

• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM 
register or a memory location specified by ModRM.r/m. 

• When XOP.W = 1, count is either an XMM register or a memory location specified by 
ModRM.r/m and src is an XMM register specified by XOP.vvvv. 

Bits [255:128] of the YMM register that corresponds to the destination are cleared. 


Instruction Support 


Form 

Subset 

Feature Flag 

VPSHLW 

XOP 

CPUID Fn8000_0001_ECX[XOP] (bit 11) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 


Encoding 



XOP 

RXB.map_select 

W.vvvv.L.pp 

Opcode 

VPSHLW xmml, xmm3/mem128, xmm2 

8F 

RXB.09 

O.count.O.OO 

95 /r 

VPSHLW xmml, xmm2, xmm3lmem128 

Related Instructions 

8F 

RXB.09 

l.src.0.00 

95 /r 


VPROTB, VPROLW, VPROTD, VPROTQ, VPSHLB, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, 
VPSHAD, VPSHAQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOPL =1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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VPSLLVD Variable Shift Left Logical 

Doublewords 

Left-shifts the bits of each doubleword in the first source operand by a count specified in the corre¬ 
sponding doubleword of a second source operand and writes the shifted values to the destination. 

The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies 
the shift count of the corresponding doubleword of the first source operand. Each doubleword is 
shifted independently. 

Low-order bits emptied by shifting are cleared. High-order bits shifted out of each doubleword are 
discarded. When the shift count for any doubleword is greater than 31, that doubleword is cleared in 
the destination. 

This instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The shift count array is specified by either a second 
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The shift count array is specified by either a second 
YMM register or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPSLLVD 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

VPSLLVD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

O.srcLO.OI 

47/r 

VPSLLVD ymml, ymm2, ymm3/mem256 

Related Instructions 

C4 

RXB.02 

O.srcLI.OI 

47/r 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPSLLVQ Variable Shift Left Logical 

Quadwords 

Left-shifts the bits of each quadword in the first source operand by a count specified in the corre¬ 
sponding quadword of a second source operand and writes the shifted values to the destination. 

The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies 
the shift count of the corresponding quadword of the first source operand. Each quadword is shifted 
independently. 

Low-order bits emptied by shifting are cleared. High-order bits shifted out of each quadword are dis¬ 
carded. When the shift count for any quadword is greater than 63, that quadword is cleared in the des¬ 
tination. 

This instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The shift count array is specified by either a second 
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The shift count array is specified by either a second 
YMM register or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPSLLVQ 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

VPSLLVQ xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1. src 1 . 0.01 

47/r 

VPSLLVQ ymml, ymm2, ymm3/mem256 

Related Instructions 

C4 

RXB.02 

l.srcll.01 

47 /r 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSRAVD, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPSRAVD Variable Shift Right Arithmetic 

Doublewords 

Performs a right arithmetic shift of each signed 32-bit integer in the first source operand by a count 
specified in the corresponding doubleword of a second source operand and writes the shifted values 
to the destination. 

The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies 
the shift count of the corresponding doubleword of the first source operand. Each doubleword is 
shifted independently. 

A copy of the sign bit is shifted into the most-significant bit of the element on each right-shift. Low- 
order bits shifted out of each element are discarded. If a doubleword contains a positive integer and 
the shift count is greater than 31, that doubleword is cleared in the destination. If a doubleword con¬ 
tains a negative integer and the shift count is greater than 31, that doubleword is set to -1 in the desti¬ 
nation. 

This instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The shift count array is specified by either a second 
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The shift count array is specified by either a second 
YMM register or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPSRAVD 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

VPSRAVD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

O.srclO.OI 

46/r 

VPSRAVD ymml, ymm2, ymm3/mem256 

Related Instructions 

C4 

RXB.02 

O.srcll.01 

46/r 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRLVD, VPSRLVQ 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPSRLVD Variable Shift Right Logical 

Doublewords 

Right-shifts each doubleword in the first source operand by a count specified in the corresponding 
doubleword of a second source operand and writes the shifted values to the destination. 

The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies 
the shift count of the corresponding doubleword of the first source operand. Each doubleword is 
shifted independently. 

Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted 
out of each element are discarded. If the shift count for any doubleword is greater than 31, that dou¬ 
bleword is cleared in the destination. 

This instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The shift count array is specified by either a second 
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The shift count array is specified by either a second 
YMM register or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPSRLVD 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

VPSRLVD xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

O.srclO.OI 

45/r 

VPSRLVD ymml, ymm2, ymm3/mem256 

Related Instructions 

C4 

RXB.02 

0.src7.1.01 

45/r 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VPSRLVQ Variable Shift Right Logical 

Quadwords 

Right-shifts each quadword in the first source operand by a count specified in the corresponding 
quadword of a second source operand and writes the shifted values to the destination. 

The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies 
the shift count of the corresponding quadword of the first source operand. Each quadword is shifted 
independently. 

Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted 
out of each element are discarded. If the shift count for any quadword is greater than 63, that quad- 
word is cleared in the destination. 

This instruction has 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The shift count array is specified by either a second 
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of 
the YMM register that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register. The shift count array is specified by either a second 
YMM register or a 256-bit memory location. The destination is a YMM register. 

Instruction Support 


Form 

Subset 

Feature Flag 

VPSRLVQ 

AVX2 

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) 


For more on using the CPUID instruction to obtain processor feature support information, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

VEX 

Encoding 

RXB.map_select W.vvvv.L.pp 

Opcode 

VPSRLVQ xmml, xmm2, xmm3/mem128 

C4 

RXB.02 

1. src 1 . 0.01 

45/r 

VPSRLVQ ymml, ymm2, ymm3/mem256 

Related Instructions 

C4 

RXB.02 

1.src7.1.01 

45/r 


(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, 
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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VTESTPD Packed Bit Test 

Performs two different logical operations on the sign bits of the first and second packed floating-point 
operands and updates the ZF and CF flags based on the results. 

First, performs a bitwise AND of the sign bits of each double-precision floating-point element of the 
first source operand with the sign bits of the corresponding elements of the second source operand. 
Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. 

Second, performs a bitwise AND of the complements (NOT) of the sign bits of each double-precision 
floating-point element of the first source with the sign bits of the corresponding elements of the sec¬ 
ond source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. 

Neither source operand is modified. 

This extended-form instruction has both 128-bit and 256-bit encoding. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. 

Instruction Support 


Form 

Subset 

Feature Flag 

VTESTPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

Opcode 

OF/r 
OF/r 

Related Instructions 

PTEST, VTESTPS 


VTESTPD xmml, xmm2/mem128 
VTESTPD ymml, ymm2/mem256 


VEX RXB.mapselect W.vvvv.L.pp 

C4 RXB.02 0.1111.0.01 

C4 RXB.02 0.1111.1.01 


Instruction Reference 


VTESTPD 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




M 

M 




21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 



0 

Note: Bits 31:22, 15,5,3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined 

flags are U. 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


AVX instructions are only recognized in protected mode. 

X 

X 

X 

CRO.EM = 1. 

X 

X 

X 

CR4.0SFXSR = 0. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

VEX.W = 1. 



X 

VEX.vvvv! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

s 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


X 

X 

Instruction execution caused a page fault. 

X — AVX exception 


Instruction Reference 


VTESTPD 


855 









AMDS 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


VTESTPS Packed Bit Test 

Performs two different logical operations on the sign bits of the first and second packed floating-point 
operands and updates the ZF and CF flags based on the results. 

First, performs a bitwise AND of the sign bits of each single-precision floating-point element of the 
first source operand with the sign bits of the corresponding elements of the second source operand. 
Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. 

Second, performs a bitwise AND of the complements (NOT) of the sign bits of each single-precision 
floating-point element of the first source with the sign bits of the corresponding elements of the sec¬ 
ond source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. 

Neither source operand is modified. 

This extended-form instruction has both 128-bit and 256-bit encoding. 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. 

YMM Encoding 

The first source operand is a YMM register. The second source operand is either a YMM register or a 
256-bit memory location. 


Instruction Support 


Form 

Subset 

Feature Flag 

VTESTPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VTESTPS xmml, xmm2/mem128 

C4 

RXB.02 

0.1111.0.01 

0E /r 

VTESTPS ymml, ymm2/mem256 

C4 

RXB.02 

0.1111.1.01 

0E /r 


Related Instructions 

PTEST, VTESTPD 


Instruction Reference 


VTESTPS 
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rFLAGS Affected 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 









0 




M 

M 




21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 



0 

Note: Bits 31:22, 15,5,3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined 

flags are U. 


MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


AVX instructions are only recognized in protected mode. 

X 

X 

X 

CRO.EM = 1. 

X 

X 

X 

CR4.0SFXSR = 0. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

VEX.W = 1. 



X 

VEX.vvvv! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

s 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


X 

X 

Instruction execution caused a page fault. 

X — AVX exception 
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VZEROALL Zero 

All YMM Registers 

Clears all YMM registers. 

In 64-bit mode, YMMO-15 are all cleared (set to all zeros). In legacy and compatibility modes, only 
YMMO-7 are cleared. The contents of the MXCSR is unaffected. 


Instruction Support 


Form 

Subset 

Feature Flag 

VZEROALL 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VZEROALL C4 RXB.01 X.1111.1.00 77 

Related Instructions 

V ZEROUPPER 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

A — AVX exception. 
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VZEROUPPER Zero 

All YMM Registers Upper 

Clears the upper octword of all YMM registers. The corresponding XMM registers (lower octword of 
each YMM register) are not affected. 

In 64-bit mode, the instruction operates on registers YMMO-15. In legacy and compatibility mode, 
the instruction operates on YMMO-7. The contents of the MXCSR is unaffected. 


Instruction Support 


Form 

Subset 

Feature Flag 

VZEROUPPER 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Encoding 

VEX RXB.mapselect W.vvvv.L.pp Opcode 

VZEROUPPER C4 RXB.01 X.1111.0.00 77 

Related Instructions 

VZEROUPPER 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

A — AVX exception. 
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XGETBV Get Extended Control Register Value 

Copies the content of the extended control register (XCR) specified by the ECX register into the 
EDX:EAX register pair. The high-order 32 bits of the XCR are loaded into EDX and the low-order 32 
bits are loaded into EAX. The corresponding high-order 32 bits of RAX and RDX are cleared. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used 
to manage processor states and provide additional functionality. See the XSAVE instruction descrip¬ 
tion for more infonnation. 

Values returned to EDX:EAX in unimplemented bit locations are undefined. 

Specifying a reserved or unimplemented XCR in ECX causes a general protection exception. 

Currently, only XCRO (the XFEATURE_ENABLED_MASK register) is supported. If CPUID reports 
support for ECX=1 (see table below), then the XGETBV instruction supports an ECX value of 1. 
When ECX=1, XGETBV returns the logical and of XCRO and the current value of the XINUSE state- 
component bitmap. 


Instruction Support 


Form 

Subset 

Feature Flag 

XGETBV 

XSAVE/XRSTOR 

CPUID Fn0000_0001_ECX[XSAVE] (bit 26) 

XGETBV 

ECX=1 support 

CPUID Fn0000_000D_EAX_x1 [2] = 1 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic 

XGETBV 

Related Instructions 

RDMSR, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

X 

X 

X 

CR4.0SXSAVE = 0 

General protection, #GP 

X 

X 

X 

ECX specifies a reserved or unimplemented XCR address. 

X — exception generated 


Opcode Description 

OF 01 DO Copies content of the XCR specified by ECX into 
EDX:EAX. 
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XORPD XOR 

VXORPD Packed Double-Precision Floating-Point 

Performs bitwise XOR of two packed double-precision floating-point values in the first source oper¬ 
and with the corresponding values of the second source operand and writes the results into the corre¬ 
sponding elements of the destination. 


There are legacy and extended forms of the instruction: 

XORPD 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VXORPD 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

XORPD 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VXORPD 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

XORPD xmml, xmm2/mem128 66 OF 57 /r Performs bitwise XOR of two packed double-precision 

floating-point values in xmml with corresponding values in 
xmm2 or mem128. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VXORPD xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl. 0.01 

57 /r 

VXORPD ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.01 

57 /r 


Related Instructions 

(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPS 


861 





AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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XORPS XOR 

VXORPS Packed Single-Precision Floating-Point 

Performs bitwise XOR of four packed single-precision floating-point values in the first source oper¬ 
and with the corresponding values of the second source operand and writes the results into the corre¬ 
sponding elements of the destination. 


There are legacy and extended forms of the instruction: 

XORPS 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the 
YMM register that corresponds to the destination are not affected. 

VXORPS 

The extended fonn of the instruction has both 128-bit and 256-bit encodings: 

XMM Encoding 

The first source operand is an XMM register. The second source operand is either an XMM register or 
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM reg¬ 
ister that corresponds to the destination are cleared. 

YMM Encoding 

The first source operand is a YMM register and the second source operand is either a YMM register 
or a 256-bit memory location. The destination is a third YMM register. 


Instruction Support 


Form 

Subset 

Feature Flag 

XORPS 

SSE2 

CPUID Fn0000_0001_EDX[SSE2] (bit 26) 

VXORPS 

AVX 

CPUID Fn0000_0001_ECX[AVX] (bit 28) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

XORPS xmml, xmm2/mem128 66 OF 57 /r Performs bitwise XOR of four packed single-precision 

floating-point values in xmml with corresponding values in 
xmm2 or mem128. Writes the result to xmml. 

Mnemonic Encoding 



VEX 

RXB.mapselect 

W.vvvv.L.pp 

Opcode 

VXORPS xmml, xmm2, xmm3/mem128 

C4 

RXB.01 

X.srcl.0.00 

57 /r 

VXORPS ymml, ymm2, ymm3/mem256 

C4 

RXB.01 

X.srcl. 1.00 

57 /r 


Related Instructions 

(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


864 






AMDS 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


XRSTOR Restore Extended States 

Restores a partial or full processor state from memory. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used 
to manage processor states and provide additional functionality. See the descriptions of XSAVE and 
XRSTOR instructions for basic operational details. 

The XRSTOR instruction may operate on the buffer in standard fonn or a compact form. The com¬ 
pact form is indicated in the memory buffer with XCOMP_BV[63]=l. 

In either form, the instruction creates a Requested Feature Bit Map (RBFM) which is the logical AND 
of EDX:EAX and XCRO. Then for each feature bit: 

1. If RFBM = 0, XRSTOR does not update the component. 

2. If RFBM = 1 but the corresponding XSTATEBV bit is 0, the component is set to its reset state 
without reading anything out of the buffer. 

3. IF RFBM =1 and XSTATE BV =1, the component state is read from the buffer. 

4. XRSTOR loads an internal state value XRSTOR INFO that can be used to further optimize a sub¬ 
sequent XSAVEOPT or XSAVES. This reflects the current privilege level and virtualization mode 
as well as the save area's base address and XCOMP BV field. 

5. If RFBM=1, the corresponding XINUSE bit is set to the state of XSTATEBV. 

For standard mode, MXCSR is loaded if RFBM[1]=1 or RFBM[2]=1. It is never initialized. 

For compact mode, MXCSR is associated with RFBM[1]. 

In some generations, the FP error pointers were only restored if there was a Floating point error 
logged. In newer generations, the FP error pointers are always restored. This is indicated by CPUID 
Fn8000_0008_EBX[2]. 


Instruction Support 


Form 

Subset 

Feature Flag 

XRSTOR 

XRSTOR 

CPUID Fn0000_00001_ECX[XSAVE] (bit 26) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

XRSTOR mem OF AE 15 Restores user-specified processor state from memory. 

Related Instructions 

XGETBV, XRSTORS, XSAVE, XSAVEC, XSAVES, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Any must be zero (MBZ) bits in the save area were set. 

X 

X 

X 

Attempt to set reserved bits in MXCSR. 

X 

X 

X 

XCOMP_BV[i] = 0 & XSTATE_BV[i] = 1 

X 

X 

X 

XCOMP_BV[l] = 1 & XCR0[i] = 0 

X 

X 

X 

Bytes 63:16 of header are non-zero 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XRSTORS Restore extended states supervisor 

Restores processor state from memory. 

XRSTORS is very similar to the XRSTOR instruction in compacted form with the following 
differences: 

1. XRSTORS must be executed at CPL=0 

2. XRSTORS must read XCOMP_BV[63]=l, otherwise it will cause a #GP(0) exception 

3. XRSTORS is able to restore state enabled from the IA32 XSS MSR. 

All other behavior is the same as XRSTOR with the compact form. 

Instruction Support 


Form 

Subset 

Feature Flag 

XRSTOR 

XRSTOR 

CPUID Fn0000_00001_ECX_X1 [XSAVES] (bit 3) 


For more on using the CPUID instruction to obtain processor feature support information, see 
Appendix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

XRSTOR mem OF C7 /3 Saves user-specified processor state to memory 

Related Instructions 

XGETBV, XRSTOR, XSAVE, XSAVEC, XSAVES, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Any must be zero (MBZ) bits in the save area were set. 

X 

X 

X 

Attempt to set reserved bits in MXCSR. 

X 

X 

X 

CPL <> 0 

X 

X 

X 

(XSTATE_BV[i] & ~lA321_XSS[i]) = 1 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSAVE Save Extended States 

Saves a user-defined subset of enabled processor state data to a specified memory address. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used 
to manage processor states and provide additional functionality. 

The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each pro¬ 
cessor state component. A component is saved when both the corresponding bits in the mask operand 
(EDXiEAX) and the XFEATURE_ENABLED_MASK (XCRO) register are set. This bit-wise logical 
AND ofEDX:EAX and XCRO is known as the Requested Feature Bit Map (RFBM). A component is 
not saved when its corresponding RFBM bit is zero. 

Software can set any bit in EDX:EAX, regardless of whether the bit position in XCRO is valid for the 
processor. When the mask operand contains all l’s, all processor state components enabled in XCRO 
are saved. 

For each component saved, XSAVE sets the corresponding bit in the XSTATEBV field of the save 
area header. XSAVE does not clear XSTATE BV bits or modify individual save areas for components 
that are not saved. If a saved component is in the hardware-specified initialized state, XSAVE may 
clear the corresponding XSTATE BV bit instead of setting it. This optimization is implementation- 
dependent. 

The MXCSR register is saved if either of RFBM bits 0 or 1 are set to 1. If there is no floating point 
error present, some generations would not write out any of the FP error pointers. On newer genera¬ 
tions, these fields are written to zeros. This is indicated by CPUID Fn8000_0008_EBX[2], 

Instruction Support 


Form 

Subset 

Feature Flag 

XSAVE 

XSAVE/XRSTOR 

CPUID Fn0000_0001 _ECX[XSAVE] (bit 26) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

XSAVE mem OF AE /4 Saves user-specified processor state to memory. 

Related Instructions 

XGETBV, XRSTOR, XSAVEOPT, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Attempt to write read-only memory. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSAVEC Save extended states in compacted form 

Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in 
a compacted form. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to 
manage processor states and provides compaction functionality for more efficient context switching. 
See the XSAVE and XRSTOR instruction descriptions for basic operational details.. 

XSAVEC is very similar to XSAVE but provides the following alternate functionality: 

1. XSAVEC differs from XSAVE by using the init optimization and compaction. 

2. XSAVEC differs by only saving a component if its RFBM=1 and its XINUSE=1. XINUSE is a 
means by which the processor determines whether the feature is in its Initial state. 

3. XSAVEC never writes bytes 511:464 of the legacy XSAVE data structure. 

4. XSAVEC calculates XSTATE_BV by performing the logical AND of the RFBM and XINUSE 
bitmaps and writes it to the XSAVE area. 

5. XSAVEC calculates XCOMP BV as [63]=1 and 62:0 = RFBM, and writes it to the XSAVE area. 

6. XSAVEC does not modify any other parts of the header except as indicated in 4 and 5. 

7. XSAVEC uses the compacted fonnat of the XSAVE extended region while saving state. 


Instruction Support 


Form 

Subset 

Feature Flag 

XSAVE mem 

XSAVEC 

CPUID Fn0000_0000D_EAX_x1 [XSAVEC] (bit 1) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see 
Appendix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

XSAVEOPT mem OF C7 /4 Saves user-specified processor state to memory. 

Related Instructions 

XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVES, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Attempt to write read-only memory. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSAVEOPT Save Extended States 

Performance Optimized 

Saves a user-defined subset of enabled processor state data to a specified memory address. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used 
to manage processor states and provide additional functionality. See the XSAVE and XRSTOR 
instruction descriptions for basic operational details. 

The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each pro¬ 
cessor state component. A component is saved when both the corresponding bits in the mask operand 
(EDX:EAX) and the XFEATURE_ENABLED_MASK (XCRO) register are set. A component is not 
saved when either of the corresponding bits in EDX:EAX or XCRO is cleared. 

Software can set any bit in EDX:EAX, regardless of whether the bit position in XCRO is valid for the 
processor. When the mask operand contains all l's, all processor state components enabled in XCRO 
are saved. 

For each component saved, XSAVEOPT sets the corresponding bit in the XSTATEBV field of the 
save area header. XSAVEOPT does not clear XSTATE BV bits or modify individual save areas for 
components that are not saved. If a saved component is in the hardware-specified initialized state, 
XSAVEOPT may clear the corresponding XSTATE BV bit instead of setting it. This optimization is 
implementation-dependent. 

XSAVEOPT may provide other implementation-specific optimizations, such as the modified optimi¬ 
zation described for XSAVES. 

Instruction Support 


Form 

Subset 

Feature Flag 

XSAVEOPT 

XSAVEOPT 

CPUID Fn0000_0000D_EAX_x1 [XSAVEOPT] (bit 0) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 

Instruction Encoding 

Mnemonic Opcode Description 

XSAVEOPT mem OF AE 16 Saves user-specified processor state to memory. 

Related Instructions 

XGETBV, XRSTOR, XSAVE, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Attempt to write read-only memory. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSAVES Save Extended States Supervisor 

Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in 
a compacted form. 

This instruction and associated data structures extend the XSAVE/XRSTOR memory image used to 
manage processor states and provides compaction functionality. See the XSAVE and XRSTOR 
instruction descriptions for basic operational details. 

The XSAVES is very similar to XSAVEC but provides the following alternate functionality: 

1. XSAVES must be executed at CPL=0 

2. XSAVES can save state enabled in the IA32_XSS MSR. The specific state elements saved are 
determined by the logical AND of EDXiEAX with the logical OR of XCRO with the IA32 XSS 
MSR. 

3. XSAVES can use the modified optimization to not save components, even if RFBM=1 and 
XINUSE=1 for the stated component. If the component state has not been modified internally 
since the last execution of XRSTOR or XRSTORS and the XRSTOR INFO state (an execution 
environment signature created by the last XRSTOR) matches the current execution state of this 
XSAVES, the state save can be skipped. 


Instruction Support 


Form 

Subset 

Feature Flag 

XSAVES 

XSAVES 

CPUID Fn0000_0000D_EAX_x1 [XSAVES] (bit 3) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 

Mnemonic Opcode Description 

XSAVES mem OF C7 15 Saves user-specified processor state to memory 

Related Instructions 

XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVEC, XSETBV 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SXSAVE = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Attempt to write read-only memory. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSETBV Set Extended Control Register Value 

Writes the content of the EDX:EAX register pair into the extended control register (XCR) specified 
by the ECX register. The high-order 32 bits of the XCR are loaded from EDX and the low-order 32 
bits are loaded from EAX. The corresponding high-order 32 bits of RAX and RDX are ignored. 

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used 
to manage processor states and provide additional functionality. See the XSAVE instruction descrip¬ 
tion for more infonnation. 

Currently, only the XFEATURE_ENABLED_MASK register (XCRO) is supported. Specifying a 
reserved or unimplemented XCR in ECX causes a general protection exception (#GP). 

Executing XSETBV at a privilege level other than 0 causes a general-protection exception. A general 
protection exception also occurs when software attempts to write to reserved bits of an XCR. 

Instruction Support 


Form 

Subset 

Feature Flag 

XSETBV 

XSAVE/XRSTOR 

CPUID Fn0000_0001_ECX[XSAVE] (bit 26) 


For more on using the CPUID instruction to obtain processor feature support infonnation, see Appen¬ 
dix E of Volume 3. 


Instruction Encoding 


Mnemonic 


Opcode 


Description 


XSETBV 


OF 01 D1 Writes the content of the EDX:EAX register pair to 
the XCR specified by the ECX register. 


Related Instructions 

XGETBV, XRSTOR, XSAVE, XSAVEOPT 

rFLAGS Affected 

None 

MXCSR Flags Affected 


None 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 


X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 


X 

CR4.0SXSAVE = 0. 

X 


X 

Lock prefix (FOh) preceding opcode. 

General protection, #GP 


X 

X 

CPL != 0. 

X 


X 

ECX specifies a reserved or unimplemented XCR address. 

X 


X 

Any must be zero (MBZ) bits in the XCR were set. 

X 


X 

Setting XCR0[2:1] to 10b. 

X 


X 

Writing 0 to XCR[0]. 

X — exception generated 
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3 Exception Summary 


This chapter provides a ready reference to instruction exceptions. Table 3-1 shows instructions 
grouped by exception class, with the extended and legacy instruction type (if applicable). 

Hyperlinks in the table point to the exception tables which follow. 


Table 3-1. Instructions By Exception Class 


Mnemonic 

Extended Type 

Legacy Type 

Class 1 — AVX / SSE Vector Aligned (VEX.vvvv != 1111) 

MOVAPD VMOVAPD 

AVX 

SSE2 

MOVAPS VMOVAPS 

AVX 

SSE 

MOVDQA VMOVDQA 

AVX 

SSE2 

MOVNTDQ VMOVNTDQ 

AVX 

SSE2 

MOVNTPD VMOVNTPD 

AVX 

SSE2 

MOVNTPS VMOVNTPS 

AVX 

SSE 

Class IX — SSE / AXV / AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && IAVX2) 

MOVNTDQA VMOVNTDQA 

AVX, AVX2 

SSE4.1 

Class 2 — AVX / SSE Vector (SIMD 111111) 

DIVPD VDIVPD 

AVX 

SSE2 

DIVPS VDIVPS 

AVX 

SSE 

Class 2-1 — AVX/SSE Vector (SIMD 111011) 

ADDPD VADDPD 

AVX 

SSE2 

ADDPS VADDPS 

AVX 

SSE 

ADDSUBPD VADDSUBPD 

AVX 

SSE2 

ADDSUBPS VADDSUBPS 

AVX 

SSE 

DPPS VDPPS 

AVX 

SSE4.1 

HADDPD VHADDPD 

AVX 

SSE3 

HADDPS VHADDPS 

AVX 

SSE3 

HSUBPD VHSUBPD 

AVX 

SSE3 

HSUBPS VHSUBPS 

AVX 

SSE3 

SUBPD VSUBPD 

AVX 

SSE2 

SUBPS VSUBPS 

AVX 

SSE 

Class 2-2 — AVX / SSE Vector (SIMD 000011) 

CMPPD VCMPPD 

AVX 

SSE2 

CMPPS VCMPPS 

AVX 

SSE 

MAXPD VMAXPD 

AVX 

SSE2 

MAXPS VMAXPS 

AVX 

SSE 

MINPD VMINPD 

AVX 

SSE2 

MINPS VMINPS 

AVX 

SSE 

MULPD VMULPD 

AVX 

SSE2 

MULPS VMULPS 

AVX 

SSE 

Class 2-3 — AVX / SSE Vector (SIMD 100001) 

(unused) 

— 

— 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1) 

(unused) 

— 

— 

Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1) 

DPPD VDPPD 

AVX 

SSE4.1 

Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b) 

(unused) 

— 

— 

Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b) 

CVTDQ2PS VCVTDQ2PS 

AVX 

SSE2 

Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b) 

CVTPD2DQ VCVTPD2DQ 

AVX 

SSE2 

CVTPS2DQ VCVTPS2DQ 

AVX 

SSE2 

CVTTPS2DQ VCVTTPS2DQ 

AVX 

SSE2 

CVTTPD2DQ VCVTTPD2DQ 

AVX 

SSE2 

ROUNDPD,VROUNDPD 

AVX 

SSE4.1 

ROUNDPS, VROUNDPS 

AVX 

SSE4.1 

Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b) 

CVTPD2PS VCVTPD2PS 

AVX 

SSE2 

Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b) 

SQRTPD VSQRTPD 

AVX 

SSE2 

SQRTPS VSQRTPS 

AVX 

SSE 

Class 3 — AVX / SSE Scalar (SIMD 111111) 

DIVSD VDIVSD 

AVX 

SSE2 

DIVSS VDIVSS 

AVX 

SSE 

Class 3-1 — AVX/SSE Scalar (SIMD 111011) 

ADDSD VADDSD 

AVX 

SSE2 

ADDSS VADDSS 

AVX 

SSE 

CVTSD2SS VCVTSD2SS 

AVX 

SSE2 

SUBSD VSUBSD 

AVX 

SSE2 

SUBSS VSUBSS 

AVX 

SSE 

Class 3-2 — AVX / SSE Scalar (SIMD 000011) 

CMPSD VCMPSD 

AVX 

SSE2 

CMPSS VCMPSS 

AVX 

SSE 

CVTSS2SD VCVTSS2SD 

AVX 

SSE2 

MAXSD VMAXSD 

AVX 

SSE2 

MAXSS VMAXSS 

AVX 

SSE 

MINSD VMINSD 

AVX 

SSE2 

MINSS VMINSS 

AVX 

SSE 

MULSD VMULSD 

AVX 

SSE2 

MULSS VMULSS 

AVX 

SSE 

UCOMISD VUCOMISD 

AVX 

SSE2 

UCOMISS VUCOMISS 

AVX 

SSE 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

Class 3-3 — AVX / SSE Scalar (SIMD 100000) 

CVTSI2SD VCVTSI2SD 

AVX 

SSE2 

CVTSI2SS VCVTSI2SS 

AVX 

SSE 

Class 3-4 — AVX / SSE Scalar (SIMD 100001) 

ROUNDSD,VROUNDSD 

AVX 

SSE4.1 

ROUNDSS, VROUNDSS 

AVX 

SSE4.1 

Class 3-5 —AVX/SSE Scalar (SIMD 100011) 

SQRTSD VSQRTSD 

AVX 

SSE2 

SQRTSS VSQRTSS 

AVX 

SSE 

Class 3A —AVX/SSE Scalar (SIMD 111111, VEX.vvvv != 1111b) 

(unused) 

— 

— 

Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b) 

COMISD VCOMISD 

AVX 

SSE2 

COMISS VCOMISS 

AVX 

SSE 

CVTPS2PD VCVTPS2PD 

AVX 

SSE2 

Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b) 

CVTSD2SI VCVTSD2SI 

AVX 

SSE2 

CVTSS2SI VCVTSS2SI 

AVX 

SSE 

CVTTSD2SI VCVTTSD2SI 

AVX 

SSE2 

CVTTSS2SI VCVTTSS2SI 

AVX 

SSE 

Class 4 — AVX / SSE Vector 

AESDEC VAESDEC 

AVX 

AES 

AESDECLAST VAESDECLAST 

AVX 

AES 

AESENC VAESENC 

AVX 

AES 

AESENCLAST VAESENCLAST 

AVX 

AES 

AESIMC VAESIMC 

AVX 

AES 

AESKEYGENASSIST VAESKEYGENASSIST 

AVX 

AES 

ANDNPD VANDNPD 

AVX 

SSE2 

ANDNPSVANDNPS 

AVX 

SSE 

ANDPD VANDPD 

AVX 

SSE2 

ANDPS VANDPS 

AVX 

SSE 

BLENDPD VBLENDPD 

AVX 

SSE4.1 

BLENDPS VBLENDPS 

AVX 

SSE4.1 

ORPDVORPD 

AVX 

SSE2 

ORPS VORPS 

AVX 

SSE 

PCLMULQDQ VPCLMULQDQ 

AVX 

CLMUL 

SHUFPD VSHUFPD 

AVX 

SSE2 

SHUFPS VSHUFPS 

AVX 

SSE2 

UNPCKHPDVUNPCKHPD 

AVX 

SSE2 

UNPCKHPSVUNPCKHPS 

AVX 

SSE 

UNPCKLPD VUNPCKLPD 

AVX 

SSE2 

UNPCKLPS VUNPCKLPS 

AVX 

SSE 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

XORPDVXORPD 

AVX 

SSE2 

XORPSVXORPS 

AVX 

SSE 

Class 4A — AVX / SSE Vector (VEX.W = 1) 

BLENDVPD VBLENDVPD 

AVX 

SSE4.1 

BLENDVPS VBLENDVPS 

AVX 

SSE4.1 

Class 4B — AVX / SSE Vector (VEX.L = 1) 

(unused) 

— 

— 

Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && IAVX2) 

MPSADBWVMPSADBW 

AVX, AVX2 

SSE4.1 

PACKSSDW VPACKSSDW 

AVX, AVX2 

SSE2 

PACKSSWB VPACKSSWB 

AVX, AVX2 

SSE2 

PACKUSDW VPACKUSDW 

AVX, AVX2 

SSE4.1 

PACKUSWB VPACKUSWB 

AVX, AVX2 

SSE2 

PADDB VPADDB 

AVX, AVX2 

SSE2 

PADDD VPADDD 

AVX, AVX2 

SSE2 

PADDQ VPADDQ 

AVX, AVX2 

SSE2 

PADDSB VPADDSB 

AVX, AVX2 

SSE2 

PADDSW VPADDSW 

AVX, AVX2 

SSE2 

PADDUSB VPADDUSB 

AVX, AVX2 

SSE2 

PAD DUS W VPADDUSW 

AVX, AVX2 

SSE2 

PADDW VPADDW 

AVX, AVX2 

SSE2 

PALIGNR VPALIGNR 

AVX, AVX2 

SSSE3 

PAND VPAND 

AVX, AVX2 

SSE2 

PANDN VPANDN 

AVX, AVX2 

SSE2 

PAVGB VPAVGB 

AVX, AVX2 

SSE 

PAVGW VPAVGW 

AVX, AVX2 

SSE 

PBLENDW VPBLENDW 

AVX, AVX2 

SSE4.1 

PCMPEQBVPCMPEQB 

AVX, AVX2 

SSE2 

PCMPEQD VPCMPEQD 

AVX, AVX2 

SSE2 

PCMPEQQ VPCMPEQQ 

AVX, AVX2 

SSE4.1 

PCMPEQW VPCMPEQW 

AVX, AVX2 

SSE2 

PCMPGTBVPCMPGTB 

AVX, AVX2 

SSE2 

PCMPGTDVPCMPGTD 

AVX, AVX2 

SSE2 

PCMPGTQ VPCMPGTQ 

AVX, AVX2 

SSE4.2 

PCMPGTW VPCMPGTW 

AVX, AVX2 

SSE2 

PHADDDVPHADDD 

AVX, AVX2 

SSSE3 

PHADDSWVPHADDSW 

AVX, AVX2 

SSSE3 

PHADDWVPHADDW 

AVX, AVX2 

SSSE3 

PHSUBD VPHSUBD 

AVX, AVX2 

SSSE3 

PHSUBW VPHSUBW 

AVX, AVX2 

SSSE3 

PHSUBSWVPHSUBSW 

AVX, AVX2 

SSSE3 

PMADDUBSW VPMADDUBSW 

AVX, AVX2 

SSSE3 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

PMADDWD VPMADDWD 

AVX, AVX2 

SSE2 

PMAXSB VPMAXSB 

AVX, AVX2 

SSE4.1 

PMAXSD VPMAXSD 

AVX, AVX2 

SSE4.1 

PMAXSW VPMAXSW 

AVX, AVX2 

SSE 

PMAXUB VPMAXUB 

AVX, AVX2 

SSE 

PMAXUD VPMAXUD 

AVX, AVX2 

SSE4.1 

PMAXUW VPMAXUW 

AVX, AVX2 

SSE4.1 

PMINSB VPMINSB 

AVX, AVX2 

SSE4.1 

PMINSD VPMINSD 

AVX, AVX2 

SSE4.1 

PMINSW VPMINSW 

AVX, AVX2 

SSE 

PMINUB VPMINUB 

AVX, AVX2 

SSE 

PMINUD VPMINUD 

AVX, AVX2 

SSE4.1 

PMINUW VPMINUW 

AVX, AVX2 

SSE4.1 

PMULDQ VPMULDQ 

AVX, AVX2 

SSE4.1 

PMULHRSW VPMULHRSW 

AVX, AVX2 

SSSE3 

PMULHUW VPMULHUW 

AVX, AVX2 

SSE2 

PMULHW VPMULHW 

AVX, AVX2 

SSE2 

PMULLD VPMULLD 

AVX, AVX2 

SSE4.1 

PMULLW VPMULLW 

AVX, AVX2 

SSE2 

PMULUDQ VPMULUDQ 

AVX, AVX2 

SSE2 

POR VPOR 

AVX, AVX2 

SSE2 

PSADBW VPSADBW 

AVX, AVX2 

SSE 

PSHUFBVPSHUFB 

AVX, AVX2 

SSSE3 

PSIGNB VPSIGNB 

AVX, AVX2 

SSSE3 

PSIGND VPSIGND 

AVX, AVX2 

SSSE3 

PSIGNW VPSIGNW 

AVX, AVX2 

SSSE3 

PSUBB VPSUBB 

AVX, AVX2 

SSE2 

PSUBD VPSUBD 

AVX, AVX2 

SSE2 

PSUBQ VPSUBQ 

AVX, AVX2 

SSE2 

PSUBSB VPSUBSB 

AVX, AVX2 

SSE2 

PSUBSW VPSUBSW 

AVX, AVX2 

SSE2 

PSUBUSB VPSUBUSB 

AVX, AVX2 

SSE2 

PSUBUSWVPSUBUSW 

AVX, AVX2 

SSE2 

PSUBWVPSUBW 

AVX, AVX2 

SSE2 

PUNPCKHBWVPUNPCKHBW 

AVX, AVX2 

SSE2 

PUNPCKHDQ VPUNPCKHDQ 

AVX, AVX2 

SSE2 

PUNPCKHQDQ VPUNPCKHQDQ 

AVX, AVX2 

SSE2 

PUNPCKHWD VPUNPCKHWD 

AVX, AVX2 

SSE2 

PUNPCKLBW VPUNPCKLBW 

AVX, AVX2 

SSE2 

PUNPCKLDQ VPUNPCKLDQ 

AVX, AVX2 

SSE2 

PUNPCKLQDQ VPUNPCKLQDQ 

AVX, AVX2 

SSE2 

PUNPCKLWD VPUNPCKLWD 

AVX, AVX2 

SSE2 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

PXOR VPXOR 

AVX, AVX2 

SSE2 

Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b) 

MOVSHDUP VMOVSHDUP 

AVX 

SSE3 

MOVSLDUP VMOVSLDUP 

AVX 

SSE3 

PTEST VPTEST 

AVX 

SSE4.1 

RCPPS VRCPPS 

AVX 

SSE 

RSQRTPS VRSQRTPS 

AVX 

SSE 

Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b) 

LDDQU VLDDQU 

AVX 

SSE3 

MOVDQUVMOVDQU 

AVX 

SSE2 

MOVUPD VMOVUPD 

AVX 

SSE2 

MOVUPS VMOVUPS 

AVX 

SSE 

Class 4D — AVX / SSE Vector (VEX.vvvv != 1111b, VEX.L = 1 

) 

MASKMOVDQU VMASKMOVDQU 

AVX 

SSE2 

PCMPESTRI VPCMPESTRI 

AVX 

SSE4.2 

PCMPESTRM VPCMPESTRM 

AVX 

SSE4.2 

PCMPISTRI VPCMPISTRI 

AVX 

SSE4.2 

PCMPISTRM VPCMPISTRM 

AVX 

SSE4.2 

PHMINPOSUW VPHMINPOSUW 

AVX 

SSE4.1 

Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 

PABSB VPABSB 

AVX, AVX2 

SSSE3 

PABSD VPABSD 

AVX, AVX2 

SSSE3 

PABSW VPABSW 

AVX, AVX2 

SSSE3 

PSHUFD VPSHUFD 

AVX, AVX2 

SSE2 

PSHUFHW VPSHUFHW 

AVX, AVX2 

SSE2 

PSHUFLW VPSHUFLW 

AVX, AVX2 

SSE2 

Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1) 

(unused) 

— 

— 

Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && IAVX2)) 

PBLENDVB VPBLENDVB 

AVX 

SSE4.1 

Class 4F — AVX / SSE (VEX.L = 1) 

(unused) 

— 

— 

Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && IAVX2] 

1 

PSLLD VPSLLD 

AVX, AVX2 

SSE2 

PSLLQ VPSLLQ 

AVX, AVX2 

SSE2 

PSLLW VPSLLW 

AVX, AVX2 

SSE2 

PS RAD VPSRAD 

AVX, AVX2 

SSE2 

PS RAW VPS RAW 

AVX, AVX2 

SSE2 

PSRLD VPSRLD 

AVX, AVX2 

SSE2 

PSRLQ VPSRLQ 

AVX, AVX2 

SSE2 

PSRLW VPSRLW 

AVX, AVX2 

SSE2 

Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b) 


884 




26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

VTESTPD 

AVX 

— 

VTESTPS 

AVX 

— 

Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions) 

VPERMD 

AVX2 

— 

VPERMPS 

AVX2 

— 

Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b) 

VPERMPD 

AVX2 

— 

VPERMQ 

AVX2 

— 

Class 4J — AVX2 (VEX.W = 1) 

VPBLENDD 

AVX2 

— 

VPSRAVD 

AVX2 

— 

Class 4K — AVX2 

VPMASKMOVD 

AVX2 

— 

VPMASKMOVQ 

AVX2 

— 

VPSLLVD 

AVX2 

— 

VPSLLVQ 

AVX2 

— 

VPSRLVD 

AVX2 

— 

VPSRLVQ 

AVX2 

— 

Class 5 — AVX / SSE Scalar 

RCPSS VRCPSS 

AVX 

SSE 

RSQRTSS VRSQRTSS 

AVX 

SSE 

Class 5A — AVX / SSE Scalar (VEX.L = 1) 

INSERTPS VINSERTPS 

AVX 

SSE4.1 

Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b) 

CVTDQ2PD VCVTDQ2PD 

AVX 

SSE2 

MOVDDUP VMOVDDUP 

AVX 

SSE3 

Class 5C —AVX /SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1) 

PINSRB VPINSRB 

AVX 

SSE4.1 

PINSRD VPINSRD 

AVX 

SSE4.1 

PINSRQ VPINSRQ 

AVX 

SSE4.1 

PINSRW VPINSRW 

AVX 

SSE 

Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 

PMOVSXBD VPMOVSXBD 

AVX, AVX2 

SSE4.1 

PMOVSXBQ VPMOVSXBQ 

AVX, AVX2 

SSE4.1 

PMOVSXBW VPMOVSXBW 

AVX, AVX2 

SSE4.1 

PMOVSXDQ VPMOVSXDQ 

AVX, AVX2 

SSE4.1 

PMOVSXWD VPMOVSXWD 

AVX, AVX2 

SSE4.1 

PMOVSXWQ VPMOVSXWQ 

AVX, AVX2 

SSE4.1 

PMOVZXBD VPMOVZXBD 

AVX, AVX2 

SSE4.1 

PMOVZXBQ VPMOVZXBQ 

AVX, AVX2 

SSE4.1 

PMOVZXBW VPMOVZXBW 

AVX, AVX2 

SSE4.1 

PMOVZXDQ VPMOVZXDQ 

AVX, AVX2 

SSE4.1 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

PMOVZXWD VPMOVZXWD 

AVX, AVX2 

SSE4.1 

PMOVZXWQ VPMOVZXWQ 

AVX, AVX2 

SSE4.1 

Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111 b, VEX.L = 1) 

EXTRACTPS VEXTRACTPS 

AVX 

SSE4.1 

MOVDVMOVD 

AVX 

SSE2 

MOVQVMOVQ 

AVX 

SSE2 

PEXTRB VPEXTRB 

AVX 

SSE4.1 

PEXTRD VPEXTRD 

AVX 

SSE4.1 

PEXTRQ VPEXTRQ 

AVX 

SSE4.1 

PEXTRWVPEXTRW 

AVX 

SSE4.1 

Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant)) 

MOVSD VMOVSD 

AVX 

SSE2 

MOVSS VMOVSS 

AVX 

SSE 

Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1) 

MOVHPD VMOVHPD 

AVX 

SSE2 

MOVHPS VMOVHPS 

AVX 

SSE 

MOVLPD VMOVLPD 

AVX 

SSE2 

MOVLPS VMOVLPS 

AVX 

SSE 

Class 6 — AVX Mixed Memory Argument 

(unused) 

— 

— 

Class 6A — AVX Mixed Memory Argument (VEX.W = 1) 

(unused) 

— 

— 

Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1] 


VMASKMOVPD 

AVX 

— 

VMASKMOVPS 

AVX 

— 

Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX. 

L = 0) 

VINSERTF128 

AVX 

— 

VINSERTI128 

AVX2 

— 

VPERM2F128 

AVX 

— 

VPERM2I128 

AVX2 

— 

Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0) 

VEXTRACTF128 

AVX 

— 

Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b) 

VBROADCASTF128 

AVX 

— 

VBROADCASTI128 

AVX2 

— 

VEXTRACTI128 

AVX2 

— 

Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && IAVX2)) 

VBROADCASTSD 

AVX, AVX2 

— 

Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b) 

VPBROADCASTB 

AVX2 

— 

VPBROADCASTD 

AVX2 

— 

VPBROADCASTQ 

AVX2 

— 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

VPBROADCASTW 

AVX2 

— 

Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && IAVX2)) 

VBROADCASTSS 

AVX, AVX2 

— 

Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant)) 

VPERMILPD 

AVX 

— 

VPERMILPS 

AVX 

— 

Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, 

ModRM.mod = 11b) 

VBROADCASTI128 

AVX2 

— 

Class 7 — AVX / SSE No Memory Argument 

(unused) 

— 

— 

Class 7A — AVX /SSE No Memory Argument (VEX.L = 1) 

MOVHLPS VMOVHLPS 

AVX 

SSE 

MOVLHPS VMOVLHPS 

AVX 

SSE 

Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && IAVX2) 

PSLLDQ VPSLLDQ 

AVX, AVX2 

SSE2 

PSRLDQ VPSRLDQ 

AVX, AVX2 

SSE2 

Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b) 

MOVMSKPD VMOVMSKPD 

AVX 

SSE2 

MOVMSKPS VMOVMSKPS 

AVX 

SSE 

Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1) 

(not used) 

— 

— 

Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 

PMOVMSKBVPMOVMSKB 

AVX, AVX2 

SSE2 

Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1) 

VZEROALL 

AVX 

— 

VZEROUPPER 

AVX 

— 

Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) 

STMXCSR VSTMXCSR 

AVX 

SSE 

Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1) 

LDMXCSR VLDMXCSR 

AVX 

SSE 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

Class 10 — XOP Base 

VPCMOV 

XOP 


VPCOMB 

XOP 

— 

VPCOMD 

XOP 

— 

VPCOMQ 

XOP 

— 

VPCOMUB 

XOP 

— 

VPCOMUD 

XOP 

— 

VPCOMUQ 

XOP 

— 

VPCOMUW 

XOP 

— 

VPCOMW 

XOP 

— 

VPERMIL2PS 

XOP 

— 

VPERMIL2PD 

XOP 

— 

Class 10A — XOP Base (XOP.L = 1) 

VPPERM 

XOP 

— 

VPSHAB 

XOP 

— 

VPSHAD 

XOP 

— 

VPSHAQ 

XOP 

— 

VPS HAW 

XOP 

— 

VPSHLB 

XOP 

— 

VPSHLD 

XOP 

— 

VPSHLQ 

XOP 

— 

VPSHLW 

XOP 

— 

Class 10B — XOP Base (XOP.W = 1, XOP.L = 1) 

VPMACSDD 

XOP 

— 

VPMACSDQH 

XOP 

— 

VPMACSDQL 

XOP 

— 

VPMACSSDD 

XOP 

— 

VPMACSSDQH 

XOP 

— 

VPMACSSDQL 

XOP 

— 

VPMACSSWD 

XOP 

— 

VPMACSSWW 

XOP 

— 

VPMACSWD 

XOP 

— 

VPMACSWW 

XOP 

— 

VPMADCSSWD 

XOP 

— 

VPMADCSWD 

XOP 

— 

Class 10C — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1) 

VPHADDBD 

XOP 

— 

VPHADDBQ 

XOP 

— 

VPHADDBW 

XOP 

— 

VPHADDD 

XOP 

— 

VPHADDDQ 

XOP 

— 

VPHADDUBD 

XOP 

— 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

VPHADDUBQ 

XOP 

— 

VPHADDUBW 

XOP 

— 

VPHADDUDQ 

XOP 

— 

VPHADDUWD 

XOP 

— 

VPHADDUWQ 

XOP 

— 

VPHADDWD 

XOP 

— 

VPHADDWQ 

XOP 

— 

VPHSUBBW 

XOP 

— 

VPHSUBDQ 

XOP 

— 

VPHSUBWD 

XOP 

— 

Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1) 

VFRCZPD 

XOP 

— 

VFRCZPS 

XOP 

— 

VFRCZSD 

XOP 

— 

VFRCZSS 

XOP 

— 

Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1) 

VPROTB 

XOP 

— 

VPROTD 

XOP 

— 

VPROTQ 

XOP 

— 

VPROTW 

XOP 

— 

Class 11 — F16C Instructions 

VCVTPH2PS 

F16C 

— 

VCVTPS2PH 

F16C 

— 

Class 12 —AVX2VSID (ModRM.mod = 11b, ModRM.rm != 100b) 

VGATHERDPD 

AVX2 

— 

VGATHERDPS 

AVX2 

— 

VGATHERQPD 

AVX2 

— 

VGATHERQPS 

AVX2 

— 

VPGATHERDD 

AVX2 

— 

VPGATHERDQ 

AVX2 

— 

VPGATHERQD 

AVX2 

— 

VPGATHERQQ 

AVX2 

— 

Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE) 

VFMADDPD 

FMA4 

— 

VFMADDPS 

FMA4 

— 

VFMADDSUBPD 

FMA4 

— 

VFMADDSUBPS 

FMA4 

— 

VFMSUBADDPD 

FMA4 

— 

VFMSUBADDPS 

FMA4 

— 

VFMSUBPD 

FMA4 

— 

VFMSUBPS 

FMA4 

— 

VFNMADDPD 

FMA4 

— 
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Table 3-1. Instructions By Exception Class (continued) 


Mnemonic 

Extended Type 

Legacy Type 

VFNMADDPS 

FMA4 

— 

VFNMSUBPD 

FMA4 

— 

VFNMSUBPS 

FMA4 

— 

Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE) 

VFMADDSD 

FMA4 

— 

VFMADDSS 

FMA4 

— 

VFMSUBSD 

FMA4 

— 

VFMSUBSS 

FMA4 

— 

VFNMADDSD 

FMA4 

— 

VFNMADDSS 

FMA4 

— 

VFNMSUBSD 

FMA4 

— 

VFNMSUBSS 

FMA4 

— 

Unique Cases 

XGETBV 

— 

— 

XRSTOR 

— 

— 

XSAVE/XSAVEOPT 

— 

— 

XSETBV 

— 

— 
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Class 1 — AVX/SSE Vector Aligned (VEX.vvvv != 1111) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

VEX256: Memory operand not 32-byte aligned. 

VEX128: Memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class IX — SSE/ AXV/AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && IAVX2) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Memory operand not aligned on a 16-byte boundary. 

S 

S 

X 

Write to a read-only data segment. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX, AVX2, and SSE exception 

A — AVXAVX2 exception 

S — SSE exception 
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Class 2 — AVX / SSE Vector (SIMD 111111) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2-1 — AVX / SSE Vector (SIMD 111011) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2-2 — AVX / SSE Vector (SIMD 000011) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2-3 — AVX / SSE Vector (SIMD 100001) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 

S 

s 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

s 

CRO.EM = 1. 


S 

S 

s 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

Invalid opcode, #UD 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 




A 

VEX.vvvv! = 1111b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 


S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 


S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

General protection, #GP 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 




X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 




A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Non-aligned memory operand while MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


s 

X 

Instruction execution caused a page fault. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

s 

s 

X 

A source operand was an SNaN value. 

s 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3 — AVX / SSE Scalar (SIMD 111111) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3-1 — AVX / SSE Scalar (SIMD 111011) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3-2 — AVX / SSE Scalar (SIMD 000011) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3-3 — AVX / SSE Scalar (SIMD 100000) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CR0.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Precision, PE 

S 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3-4 — AVX / SSE Scalar (SIMD 100001) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 


908 






26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


Class 3-5 — AVX / SSE Scalar (SIMD 100011) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3A — AVX / SSE Scalar (SIMD 111111, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

Division by zero, ZE 

s 

s 

X 

Division of finite dividend by zero-value divisor. 

Overflow, OE 

s 

s 

X 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 

s 

s 

X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Denormalized operand, DE 

s 

s 

X 

A source operand was a denormal value. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 

S 

S 

X 

CR0.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 

S 

s 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 

S 

s 

X 

A source operand was an SNaN value. 

S 

s 

X 

Undefined operation. 

Precision, PE 

s 

s 

X 

A result could not be represented exactly in the destination format. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4 — AVX / SSE Vector 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

s 

Memory operand not 16-byte aligned and MXCSR.MM = 0. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4A — AVX / SSE Vector (VEX.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4B — AVX / SSE Vector (VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && IAVX2) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

X 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4D — AVX / SSE Vector (VEX.vvvv != 1111 b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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Class 4F — AVX / SSE (VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && IAVX2) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 

S 

S 

S 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

When alignment checking enabled: 

• 128-bit memory operand not 16-byte aligned. 

• 256-bit memory operand not 32-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

X —AVX, AVX2, and SSE exception 

A — AVX and AVX2 exception 

S — SSE exception 
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Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


AVX instructions are only recognized in protected mode. 

X 

X 

X 

CRO.EM = 1. 

X 

X 

X 

CR4.0SFXSR = 0. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

VEX.W = 1. 



X 

VEX.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Alignment check, #AC 

s 

s 

s 

Memory operand not 16-byte aligned when alignment checking enabled 
and MXCSR.MM = 1. 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


X 

X 

Instruction execution caused a page fault. 

X — AVX exception 
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Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


A 

A 

A 

CRO.EM = 1. 


A 

A 

A 

CR4.0SFXSR = 0. 




A 

CR4.0SXSAVE = 0, indicated by CPUID 

F n0000_0001 _ECX[OSXS AVE]. 




A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 




A 

VEX.L= 0. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

A 

A 

A 

CRO.EM = 1. 

A 

A 

A 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L= 0. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

A 

A 

A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

A 

A 

A 

CRO.TS = 1. 

Stack, #SS 

A 

A 

A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

A 

A 

A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 


927 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Class 4J — AVX2 (VEX.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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Class 4K — AVX2 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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Class 5 — AVX / SSE Scalar 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 


X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 


A 

A 


AVX instructions are only recognized in protected mode. 


S 

S 

S 

CRO.EM = 1. 

Invalid opcode, #UD 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 




A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 




A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 


S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5A — AVX / SSE Scalar (VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference with alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5C —AVX/SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination enoding only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv ! = 1111b (for memory destination encoding only). 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

S 

S 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 



X 

Null data segment used to reference memory. 

Page fault, #PF 


S 

X 

Instruction execution caused a page fault. 

Alignment check, #AC 


S 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 6 — AVX Mixed Memory Argument 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 6A — AVX Mixed Memory Argument (VEX.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

S 

S 

X 

Write to a read-only data segment. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

A — AVX exception. 
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Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 


941 






AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Write to a read-only data segment. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Memory operand not 16-byte aligned when alignment checking enabled. 

A — AVX exception. 
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Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

Register-based source operand specified when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX, AVX2 exception. 
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Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

MODRM.mod = 11b when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX, AVX2 exception. 
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Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b (for versions with immediate byte operand only). 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, ModRM.mod = 11b) 
Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

VEX.L = 0. 



A 

Register-based source operand specified (MODRM.mod = 11b) 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Page fault, #PF 



A 

Instruction execution caused a page fault. 

Alignment check, #AC 



A 

Unaligned memory reference when alignment checking enabled. 

A — AVX exception. 
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Class 7 — AVX / SSE No Memory Argument 
Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

s 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 7A — AVX /SSE No Memory Argument (VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && IAVX2) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.L = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X— SSE, AVX, and AVX2 exception 

A — AVX, AVX2 exception 

S — SSE exception 
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Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv field ! = 1111b. 



A 

VEX.L field = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && IAVX2)) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 

S 

S 

S 

CRO.EM = 1. 

S 

S 

s 

CR4.0SFXSR = 0. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv field ! = 1111b. 



A 

VEX.L field = 1 when AVX2 not supported. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

X — SSE, AVX and AVX2 exception 

A — AVX, AVX2exception 

S — SSE exception 
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Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.W = 1. 



A 

VEX.vvvv ! = 1111b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



A 

CRO.TS = 1. 

A — AVX exception. 
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Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

S 

S 

S 

CRO.EM = 1. 

S 

S 

S 

CR4.0SFXSR = 0. 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

X 

Write to a read-only data segment. 

S 

S 

S 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 

S 

S 

S 

CRO.EM = 1. 

S 

S 

S 

CR4.0SFXSR = 0. 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

VEX.vvvv! = 1111b. 



A 

VEX.L = 1. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

S 

S 

X 

CRO.TS = 1. 

Stack, #SS 

S 

S 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

S 

S 

X 

Memory address exceeding data segment limit or non-canonical. 

S 

S 

S 

Null data segment used to reference memory. 

S 

S 

X 

Attempt to load non-zero values into reserved MXCSR bits 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 


s 

X 

Unaligned memory reference when alignment checking enabled. 

X — AVX and SSE exception 

A — AVX exception 

S — SSE exception 
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Class 10 —XOP Base 
Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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Class 10A — XOP Base (XOP.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.L = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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Class 10B — XOP Base (XOP.W = 1, XOP.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.W = 1. 



X 

XOP.L = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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Class IOC — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.W = 1. 



A 

XOP.vvvv ! = 1111b. 



X 

XOP.L = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.W = 1. 



X 

XOP.vvvv ! = 1111b. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0. 
See SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 

S 

S 

X 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



X 

A source operand was an SNaN value. 



X 

Undefined operation. 

Denormalized operand, DE 



X 

A source operand was a denormal value. 

Underflow, UE 



X 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



X 

A result could not be represented exactly in the destination format. 

X — XOP exception 
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Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 


XOP instructions are only recognized in protected mode. 



X 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



X 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



X 

XOP.vvvv ! = 1111b (for immediate operand variant only) 



X 

XOP.L field = 1. 



X 

REX, F2, F3, or 66 prefix preceding XOP prefix. 



X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 



X 

CRO.TS = 1. 

Stack, #SS 



X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



X 

Memory address exceeding data segment limit or non-canonical. 



X 

Null data segment used to reference memory. 

Page fault, #PF 



X 

Instruction execution caused a page fault. 

Alignment check, #AC 



X 

Memory operand not 16-byte aligned when alignment checking enabled. 

X — XOP exception 
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Class 11 — F16C Instructions 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


AVX instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID 

F n0000_0001 _ECX[OSXS AVE]. 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

VEX.W field = 1. 



A 

VEX.vvvv ! = 1111b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Alignment check, #AC 



F 

Unaligned memory reference when alignment checking enabled. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

SIMD Floating-Point 
Exception, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid-operation exception 
(IE) 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized-operand 
exception (DE) 



F 

A source operand was a denormal value. 

Overflow exception (OE) 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow exception (UE) 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision exception (PE) 



F 

A result could not be represented exactly in the destination format. 

F — F16C exception. 
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Class 12 —AVX2 VSID (ModRM.mod = 11b, ModRM.rm != 100b) 


Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

A 

A 

A 

Instruction not supported, as indicated by CPUID feature identifier. 

A 

A 


AVX instructions are only recognized in protected mode. 



A 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



A 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



A 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



A 

Lock prefix (FOh) preceding opcode. 



A 

MODRM.mod = 11b 



A 

MODRM.rm ! = 100b 



A 

YMM/XMM registers specified for destination, mask, and index not unique. 

Device not available, #NM 



A 

CR0.TS = 1. 

Stack, #SS 



A 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



A 

Memory address exceeding data segment limit or non-canonical. 



A 

Null data segment used to reference memory. 

Alignment check, #AC 



A 

Alignment checking enabled and: 

256-bit memory operand not 32-byte aligned or 

128-bit memory operand not 16-byte aligned. 

Page fault, #PF 


A 

A 

Instruction execution caused a page fault. 

A — AVX2 exception 
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Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE) 
Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Memory operand not 16-byte aligned when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE) 
Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 



F 

Instruction not supported, as indicated by CPUID feature identifier. 

F 

F 


FMA instructions are only recognized in protected mode. 



F 

CR4.0SXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE], 



F 

XFEATURE_ENABLED_MASK[2:1] ! = 11b. 



F 

REX, F2, F3, or 66 prefix preceding VEX prefix. 



F 

Lock prefix (FOh) preceding opcode. 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 0, 
see SIMD Floating-Point Exceptions below for details. 

Device not available, #NM 



F 

CRO.TS = 1. 

Stack, #SS 



F 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 



F 

Memory address exceeding data segment limit or non-canonical. 



F 

Null data segment used to reference memory. 

Page fault, #PF 



F 

Instruction execution caused a page fault. 

Alignment check, #AC 



F 

Non-aligned memory reference when alignment checking enabled. 

SIMD floating-point, #XF 



F 

Unmasked SIMD floating-point exception while CR4.0SXMMEXCPT = 1, 
see SIMD Floating-Point Exceptions below for details. 

SIMD Floating-Point Exceptions 

Invalid operation, IE 



F 

A source operand was an SNaN value. 



F 

Undefined operation. 

Denormalized operand, DE 



F 

A source operand was a denormal value. 

Overflow, OE 



F 

Rounded result too large to fit into the format of the destination operand. 

Underflow, UE 



F 

Rounded result too small to fit into the format of the destination operand. 

Precision, PE 



F 

A result could not be represented exactly in the destination format. 

F — FMA, FMA4 exception 
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XGETBV 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

General protection, #GP 

X 

X 

X 

ECX specifies a reserved or unimplemented XCR address. 

X — exception generated 
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XRSTOR 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SFXSR = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Any must be zero (MBZ) bits in the save area were set. 

X 

X 

X 

Attempt to set reserved bits in MXCSR. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSAVE/XSAVEOPT 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SFXSR = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

Device not available, #NM 

X 

X 

X 

CRO.TS = 1. 

Stack, #SS 

X 

X 

X 

Memory address exceeding stack segment limit or non-canonical. 

General protection, #GP 

X 

X 

X 

Memory address exceeding data segment limit or non-canonical. 

X 

X 

X 

Null data segment used to reference memory. 

X 

X 

X 

Memory operand not aligned on 64-byte boundary. 

X 

X 

X 

Attempt to write read-only memory. 

Page fault, #PF 

X 

X 

X 

Instruction execution caused a page fault. 

X — exception generated 
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XSETBV 

Exceptions 


Exception 

Mode 

Cause of Exception 

Real 

Virt 

Prot 

Invalid opcode, #UD 

X 

X 

X 

Instruction not supported, as indicated by CPUID feature identifier. 

X 

X 

X 

CR4.0SFXSR = 0. 

X 

X 

X 

Lock prefix (FOh) preceding opcode. 

General protection, #GP 

X 

X 

X 

CPL != 0. 

X 

X 

X 

ECX specifies a reserved or unimplemented XCR address. 

X 

X 

X 

Any must be zero (MBZ) bits in the save area were set. 

X 

X 

X 

Writing 0 to XCRO. 


X — exception generated 

Note: 

In virtual mode , only #UD for Instruction not supported and #GP for CPL !- 0 are supported. 
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Appendix A AES Instructions 


This appendix gives background information concerning the use of the AES instruction subset in the 
implementation of encryption compliant to the Advanced Encryption Standard (AES). 

A.1 AES Overview 

This section provides an overview of AMD64 instructions that support AES software implementation. 

The U.S. National Institute of Standards and Technology has adopted the Rijndael algorithm, a block 
cipher that processes 16-byte data blocks using a shared key of variable length, as the Advanced 
Encryption Standard (AES). The standard is defined in Federal Information Processing Standards 
Publication 197 (FIPS 197), Specification for the Advanced Encryption Standard (AES). There are 
three versions of the algorithm, based on key widths of 16 (AES-128), 24 (AES-192), and 32 (AES- 
256) bytes. 

The following AMD64 instructions support AES implementation: 

• AESDEC/VAESDEC and AESDECLAST/VAESDECLAST 

Perform one round of AES decryption 

• AESENC/VAESENC and AESENCLAST/VAESENCLAST 

Perform one round of AES encryption 

• AESIMC/VAESIMC 

Perform the AES InvMixColumn transformation 

- AESKEYGENASSIST/VAESKEYGENASSIST 

Assist AES round key generation 

- PCLMULQDQ, VPCLMULQDQ 

Perform carry-less multiplication 

See Chapter 2, “Instruction Reference” for detailed descriptions of the instructions. 

A.2 Coding Conventions 

This overview uses descriptive code that has the following basic characteristics. 

• Syntax and notation based on the C language 

• Four numerical data types: 

bool: The numbers 0 and 1, the values of the Boolean constants false and true 
nat: The infinite set of all natural numbers, including bool as a subtype 
int: The infinite set of all integers, including nat as a subtype 
rat: The infinite set of all rational numbers, including int as a subtype 
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• Standard logical and arithmetic operators 

• Enumeration (enum) types, arrays, structures (struct), and union types 

• Global and local variable and constant declarations, initializations, and assignments 

• Standard control constructs (if, then, else, for, while, switch, break, and continue) 

• Function subroutines 

• Macro definitions (#define) 


A.3 AES Data Structures 

The AES instructions operate on 16-byte blocks of text called the state. Each block is represented as a 
4x4 matrix of bytes which is assigned the Galois field matrix data type (GFMatrix). In the AMD64 
implementation, the matrices are formatted as 16-byte vectors in XMM registers or 128-bit memory 
locations. This overview represents each matrix as a sequence of 16 bytes in little-endian format (least 
significant byte on the right and most significant byte on the left). 

Figure A-l shows a state block in 4 x 4 matrix representation. 


GFMatrix = 


X 3, 0^2,0 
^ 3,1 X 2,l 
X 3,2 ^ 2,2 
X 3£ ^2,3 


*1,0%» 
^ 1 , 1^04 
X 1,2 X 0,2 
^ 1,3 ^ 0,3 


Figure A-1. GFMatrix Representation of 16-byte Block 

Figure A-2 shows the AMD64 AES format, with the corresponding mapping of FIPS 197 AES 
“words” to operand bytes. 


XMM Register or 128-bit Memory Operand 

127 120 119 112 L11 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 


^3,3 

^2,3 

^1,3 

^0,3 

^3,2 

X 2,2 

^1,2 

^0,2 

^3,1 

^2,1 


^0,1 

^3,0 

^2,0 

*1,0 

■'«.» 1 




-v- 

AES Word 3 




-v- 

AES Word 2 


J k_ 


-v- 

AES Word 1 




-v- 

AES Word 0 


J 


Figure A-2. GFMatrix to Operand Byte Mappings 


A.4 Algebraic Preliminaries 

o 

AES operations are based on the Galois field GF = GF (2 ), of order 256, constructed by adjoining a 
root of the irreducible polynomial 
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/>(X)=X 8 +X* + X 3 +X+ 1 

to the field of two elements, 7L 2 . Equivalently, GF is the quotient field Z 2 [X\/p(X) and thus may be 
viewed as the set of all polynomials of degree less than 8 in Z 2 [X] with the operations of addition and 
multiplication modulo p(X). These operations may be implemented efficiently by exploiting the 
mapping from Z 2 [X] to the natural numbers given by 

a lt X" + ... + a]X+a 0 —> 2 n a„ + ... + 2 a 1 + a 0 —> a n ... aja 0 b 

For example: 

1 ->01h 
X—> 02h 
X 2 -> 04h 
X 4 + X- ? + 1 -> 19h 
p(X)-> llBh 

Thus, each element of GF is identified with a unique byte. This overview uses the data type GF256 as 
an alias of nat, to identify variables that are to be thought of as elements of GF. 

The operations of addition and multiplication in GFare denoted by © and O, respectively. Since Z 2 is 
of characteristic 2, addition is simply the “exclusive or” operation: 

x ©_v = x A y 

In particular, every element of GF is its own additive inverse. 

Multiplication in GF may be computed as a sequence of additions and multiplications by 2. Note that 
this operation may be viewed as multiplication in Z 2 [X] followed by a possible reduction modulo p(X). 
Since 2 corresponds to the polynomialXand 1 IB corresponds to p(X), for any x e GF, 


r x«i 

2©x= \ 

[ (x« 1)© llBh 


if x < 80h 
if x > 80h 


Now, if y = b 7 ...bjbJj, then 

x O y = 2 O (...(2 O (2 O (b 7 Q x) © b 6 Ox) © b 5 Ox) ...b 0 . 

This computation is perfonned by the GFMul( ) function. 

A.4.1 Multiplication in the Field GF 

The GFMul( ) function operates on GF256 elements in SRC1 and SRC2 and returns a GF256 matrix 
in the destination. 

GF256 GFMul(GF256 x, GF256 y) { 
nat sum = 0; 
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for (int i=7; i>=0; i—) { 

// Multiply sum by 2. This amounts to a shift followed 
// by reduction mod OxllB: 
sum <<= 1; 

if (sum > OxFF) {sum = sum A OxllB; } 

// Add y[i]*x: 

if (y[i]) {sum = sum A x; } 

} 

return sum; 


Because the multiplicative group GF* is of order 255, the inverse of an element x of GF may be 
computed by repeated multiplication as x A = x 254 . A more efficient computation, however, is 
performed by the GFInv( ) function as an application of Euclid’s greatest common divisor algorithm. 
See Section A. 11, “Computation of GFInv with Euclidean Greatest Common Divisor” for an analysis 
of this computation and the GFInv( ) function. 

The AES algorithms operate on the vector space GF 4 , of dimension 4 over GF, which is represented by 
the array type GFWord. FIPS 197 refers to an object of this type as a word. This overview uses the 
term GF word in order to avoid confusion with the AMD64 notion of a 16-bit word. 

A GFMatrix is an array of four GF words, which are viewed as the rows of a 4 x 4 matrix over GF. 

The field operation symbols © and O are used to denote addition and multiplication of matrices over 
GF as well. The GFMatrixMul( ) function computes the product A O B of 4 x 4 matrices. 

A.4.2 Multiplication of 4x4 Matrices Over GF 

, GFMatrix GFMatrixMul(GFMatrix a, GFMatrix b) { 

GFMatrix c; 

for (nat i=0; i<4; i++) { 

for (nat j=0; j<4; j++) { 

c[i] [ j] = 0; 

for (nat k=0; k<4; k++) { 

c [i] [ j ] = c [i] [ j ] A GFMul (a [i] [k] , b [k] [ j ] ) ; 

} 

} 

} 

return c; 


A.5 AES Operations 

The AES encryption and decryption procedures may be specified as follows, in tenns of a set of basic 
operations that are defined later in this section. See the alphabetic instruction reference for detailed 
descriptions of the instructions that are used to implement the procedures. 

Call the Encrypt or Decrypt procedure, which pass the same expanded key to the functions 

TextBlock Cipher(TextBlock in, ExpandedKey w, nat Nk) 


and 
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TextBlock InvCipher(TextBlock in, ExpandedKey w, nat Nk) 

In both cases, the input text is converted by 

GFMatrix Text2Matrix(TextBlock A) 

to a matrix, which becomes the initial state of the process. This state is transformed through the 
sequence of N r + 1 rounds and ultimately converted back to a linear array by 

TextBlock Matrix2Text(GFMatrix M). 

In each round i, the round key K t is extracted from the expanded key w and added to the state by 

GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round). 

Note that AddRoundKey does not explicitly construct K t , but operates directly on the bytes of w. 

The rounds of Cipher are numbered 0,.. ,N r LetXbe the initial state an an execution, i.e., the input in 
matrix fonnat, let S t be the state produced by round i, and let Y= S Nr be the final state. Let X, R , and C 
denote the operations perfonned by SubBytes, ShiftRows, MixColumns, respectively. Then 

The initial round is a simple addition: 

So = X ® /fo; 


Each of the next N r + 1 rounds is a composition of four operations: 

Si =C('R(E(5 < _i)))®K i for i = 1,... ,N r - 1; 

The MixColumns transfonnation is omitted from the final round: 

Y = S Nr = 7e(E(5 jVr -i)) © K Nr . 

Composing these expressions yields 

Y = ft(E(C(ft(E(- • • © K 0 ))) © Ki)- ••)))© /Cv,-i)) © K jVr . 

Note that the rounds of InvCipher are numbered in reverse order, N r ,... ,0. If 27’ and F are the initial 
and final states and S is the state following round i , then 
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G' 


= X'<& K Nr ; 


Si = © Ki) for t = jV r — 1,.... 1; 

r = E- 1 (7£- 1 (S' 1 ))© I< 0 . 


Composing these expressions yields 

Y' = • ■ (C -1 (E -1 (72. -1 (X' © K n „) ® K Nr -i)) ■■■))© *h))) © ^o- 


In order to show that InvCipher is the inverse of Cipher, it is only necessary to combine these 
expanded expressions by replacing X with Y and cancel inverse operations to yield Y’ = X. 

A.5.1 Sequence of Operations 

• Use predefined SBox and InvSBox matrices or initialize the matrices using the ComputeSBox 
and ComputelnvSBox functions. 

• Call the Encrypt or Decrypt procedure. 

• For the Encrypt procedure: 

1. Load the input TextBlock and CipherKey. 

2. Expand the cipher key using the KeyExpansion function. 

3. Call the Cipher function to perfonn the number of rounds determined by the cipher key length. 

4. Perform round entry operations. 

a. Convert input text block to state matrix using the Text2Matrix function. 

b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function. 

5. Perform round iteration operations. 

a. Replace each state byte with another by non-linear substitution using the SubBytes function. 

b. Shift each row of the state cyclically using the ShiftRows function. 

c. Combine the four bytes in each column of the state using the MixColumns function. 

d. Perform AddRoundKey. 

6. Perform round exit operations. 

a. Perform SubBytes. 

b. Perform ShiftRows. 

c. Perform AddRoundKey. 

d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock. 

• For the Decrypt procedure: 

1. Load the input TextBlock and CipherKey. 
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2. Expand the cipher key using the KeyExpansion function. 

3. Call the InvCipher function to perform the number of rounds determined by the cipher key 
length. 

4. Perform round entry operations. 

a. Convert input text block to state matrix using the Text2Matrix function. 

b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function. 

5. Perform round iteration operations. 

a. Shift each row of the state cyclically using the InvShiftRows function. 

b. Replace each state byte with another by non-linear substitution using the InvSubBytes function. 

c. Perform AddRoundKey. 

d. Combine the four bytes in each column of the state using the InvMixColumns function. 

6. Perform round exit operations. 

a. Perform InvShiftRows. 

b. Perform InvSubBytes (InvSubWord). 

c. Perform AddRoundKey. 

d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock. 


A.6 Initializing the Sbox and InvSBox Matrices 


The AES makes use of a bijective mapping a : GF —> GF, which is encoded, along with its inverse 
mapping, in the 16 x 16 arrays SBox (for encryption) and InvSBox (for decryption), as follows: 

for all x e G, 

g(x) = SBox[x[7:4], x[3:0]] 


and 

o -1 (x) = InvSBox[x[7:4],x[3:0]] 


While the FIPS 197 standard defines the contents of the SBox[ ] and InvSbox [ ] matrices, the 
matrices may also be initialized algebraically (and algorithmically) by means of the ComputeSBox( ) 
and ComputeInvSBox( ) functions, discussed below. 

The bijective mappings for encryption and decryption are computed by the SubByte( ) and 
InvSubByte ( ) functions, respectively: 

SubByte( ) computation: 

GF256 SubByte(GF256 x) { 

return SBox[x[7:4]] [x[3:0]]; 

} 


InvSubByte ( ) computation: 

GF256 InvSubByte(GF256 x) { 

return InvSBox[x[7:4]][x[3:0]]; 

} 


979 



AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


A.6.1 Computation of SBox and InvSBox 

Computation of SBox and InvSBox elements has a direct relationship to the cryptographic properties 
of the AES, but not to the algorithms that use the tables. Readers who prefer to view a as a primitive 
operation may skip the remainder of this section. 

The algorithmic definition of the bijective mapping o is based on the consideration of GF as an 
8-dimensional vector space over the subfield Z 2 . Let (p be a linear operator on this vector space and let 
M= [ay\ be the matrix representation of cp with respect to the ordered basis {1, 2, 4, 10, 20, 40, 80}. 
Then (p may be encoded concisely as an array of bytes A of dimension 8, each entry of which is the 
concatenation of the corresponding row of M\ 

A[i\ = a% a n ...a i0 

This expression may be represented algorithmically by means of the ApplyLinearOp( ) function, 
which applies a linear operator to an element of GF. The ApplyLinear Op( ) function is used in the 
initialization of both the sBox[] and InvSBox} ] matrices. 

// The following function takes the array A representing a linear operator phi and 
// an element x of G and returns phi(x): 

GF256 ApplyLinearOp(GF256 A[8], GF256 x) { 

GF256 result = 0; 
for (nat i=0; i<8; i++) { 

bool sum = 0; 
for (nat j=0; j<8; j++) { 

sum = sum A (A[i][j] & x[j]); 

} 

result [i] = sum; 

} 

return result; 

} 


The definition of a involves the linear operator cp with matrix 



T 

0 

0 

0 

1 

1 

1 

T 


1 

1 

0 

0 

0 

1 

1 

1 


1 

1 

1 

0 

0 

0 

1 

1 

M = 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

0 


0 

1 

1 

1 

1 

1 

0 

0 


0 

0 

1 

1 

1 

1 

1 

0 


0 

0 

0 

1 

1 

1 

1 

1 


In this case, 

A = {FI, A3, Cl, 8 F, IF, 3 E, 1C, F8}. 


initialization of SBox[ ] 

The mapping a : G —> G is defined by 
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o(x) = cp (x ^ ® 63 

This computation is performed by ComputeSBox( ). 

ComputeSBox() 

GF256[16][16] ComputeSBox() { 

GF256 result[16] [16] ; 

GF256 A[8] = {OxFl, 0xE3, 0xC7, 0x8F, OxlF, 0x3E, 0x7C, 
for (nat i=0; i<16; i++) { 

for (nat j=0; j<16; j++) { 

GF256 x = (i << 4) | j; 

result [i][j] = ApplyLinearOp(A, GFInv(x)) A 0x63; 

} 

} 

return result; 


const GF256 SBox[16][16] = ComputeSBox(); 


Table A-l shows the resulting SBox[ ], as defined in FIPS 197. 


0xF8 }; 


981 



AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


Table A-1. SBox Definition 



S[3:0] 

S[7:4] 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

a 

b 

c 

d 

e 

f 

0 

63 

7c 

77 

7b 

f2 

6b 

6f 

c5 

30 

01 

67 

2b 

fe 

d7 

ab 

76 

1 

ca 

82 

c9 

7d 

fa 

59 

47 

fO 

ad 

d4 

a2 

af 

9c 

a4 

72 

cO 

2 

b7 

fd 

93 

26 

36 

3f 

f7 

cc 

34 

a5 

e5 

fl 

71 

d8 

31 

a5 

3 

04 

cl 

23 

c3 

18 

96 

05 

9a 

07 

12 

80 

e2 

eb 

27 

b2 

75 

4 

09 

83 

2c 

la 

1b 

6e 

5a 

aO 

52 

3b 

d6 

b3 

29 

e3 

2f 

84 

5 

53 

dl 

00 

ed 

20 

fc 

bl 

5b 

6a 

cb 

be 

39 

4a 

4c 

58 

cf 

6 

dO 

ef 

aa 

fb 

43 

4d 

33 

85 

45 

f9 

02 

7f 

50 

3c 

9f 

a8 

7 

51 

a3 

40 

8f 

92 

9d 

38 

f5 

be 

b6 

da 

21 

10 

ff 

f3 

d2 

8 

cd 

0c 

13 

ec 

5f 

97 

44 

17 

c4 

a7 

7e 

3d 

64 

5d 

19 

73 

9 

60 

81 

4f 

dc 

22 

2a 

90 

88 

46 

ee 

b8 

14 

de 

5e 

0b 

db 

a 

eO 

32 

3a 

0a 

49 

06 

24 

5c 

c2 

d3 

ac 

62 

91 

95 

e4 

79 

b 

el 

c8 

37 

6d 

8d 

d5 

4e 

a9 

6c 

56 

f4 

ea 

65 

7a 

ae 

08 

c 

ba 

78 

25 

2e 

1c 

a6 

b4 

c6 

e8 

dd 

74 

If 

4b 

bd 

8b 

8a 

d 

70 

3e 

b5 

66 

48 

03 

f6 

Oe 

61 

35 

57 

b9 

86 

cl 

Id 

9e 

e 

el 

f8 

98 

11 

69 

d9 

8e 

94 

9b 

1e 

87 

e9 

ce 

55 

28 

df 

f 

8c 

al 

89 

Od 

bf 

e6 

42 

68 

41 

99 

2d 

Of 

bO 

54 

bb 

16 


A.6.2 Initialization of lnvSBox[ ] 

A straightforward calculation confirms that the matrix Mis nonsingular with inverse. 

Thus, (p is invertible and cp 1 is encoded as the array 

0 0 1 0 0 1 0 1 

1 0 0 1 0 0 1 0 

0 1 0 0 1 0 0 1 

10100100 
M - 0 1 0 1 0 0 1 0 

0 0 1 0 1 0 0 1 

10 0 10 10 0 
0 10 0 10 10 

B = {A4, 49, 92, 25, 4A, 94, 29, 52}. 


If y = o(x), then 
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(v'iiy) © 5; - ; = (cp- y (v © cp(5))- ; 

= (9 _y C y © 63))- y 
= (cp y (cp(jc“ y ) © 63 © 63)) _y 

= (( P ; (< P (*- 7 )))- 7 

= x. 


and a is a permutation of GF with 


a- ; Cv) = ((p- ; Cv)©3)- 7 


This computation is performed by ComputeInvSBox( ). 

ComputelnvSBox() 

GF256[16] [16] ComputelnvSBox() { 

GF256 result[16] [16] ; 

GF256 B[8] = {0xA4, 0x49, 0x92, 0x25, 0x4A, 0x94, 0x29, 0x52}; 
for (nat i=0; i<16; i++) { 

for (nat j=0; j<16; j++) { 

GF256 y = (i << 4) | j; 

result [i][j] = GFInv(ApplyLinearOp(B, y) A 0x5); 

} 

} 

return result; 


const GF256 InvSBox[16][16] = ComputelnvSBox(); 

Table A-2 shows the resulting InvSBox[ ], as defined in the FIPS 197. 
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Table A-2. InvSBox Definition 



S[3:0] 

S[7:4] 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

a 

b 

c 

d 

e 

f 

0 

52 

09 

6a 

d5 

30 

36 

a5 

38 

bf 

40 

a3 

9e 

81 

f3 

d7 

fb 

1 

7c 

e3 

39 

82 

9b 

2f 

ff 

87 

34 

8e 

43 

44 

c4 

de 

e9 

cb 

2 

54 

7b 

94 

32 

a6 

c2 

23 

3d 

ee 

4c 

95 

0b 

42 

fa 

c3 

4e 

3 

08 

2e 

al 

66 

28 

d9 

24 

b2 

76 

5b 

a2 

49 

6d 

8b 

dl 

25 

4 

72 

f8 

f6 

64 

86 

68 

98 

16 

d4 

a4 

5c 

cc 

5d 

65 

b6 

92 

5 

6c 

70 

48 

50 

fd 

ed 

b9 

da 

5e 

15 

46 

57 

a7 

8d 

9d 

84 

6 

90 

d8 

ab 

00 

8c 

be 

d3 

0a 

f7 

e4 

58 

05 

b8 

b3 

45 

06 

7 

dO 

2c 

1e 

8f 

ca 

3f 

Of 

02 

cl 

af 

bd 

03 

01 

13 

8a 

6b 

8 

3a 

91 

11 

41 

4f 

67 

dc 

ea 

97 

f2 

cf 

ce 

fO 

b4 

e6 

73 

9 

96 

ac 

74 

22 

el 

ad 

35 

85 

e2 

f9 

37 

e8 

1c 

75 

df 

6e 

a 

47 

fl 

la 

71 

Id 

29 

c5 

89 

6f 

b7 

62 

Oe 

aa 

18 

be 

1b 

b 

fc 

56 

3e 

4b 

c6 

d2 

79 

20 

9a 

db 

cO 

fe 

78 

cd 

5a 

f4 

c 

If 

dd 

a8 

33 

88 

07 

c7 

31 

bl 

12 

10 

59 

27 

80 

ec 

5f 

d 

60 

51 

7f 

a9 

19 

b5 

4a 

Od 

2d 

e5 

7a 

9f 

93 

c9 

9c 

ef 

e 

aO 

eO 

3b 

4d 

ae 

2a 

f5 

bO 

c8 

eb 

bb 

3c 

83 

53 

99 

61 

f 

17 

2b 

04 

7e 

ba 

77 

d6 

26 

el 

69 

14 

63 

55 

21 

0c 

7d 


A.7 Encryption and Decryption 

The AMD64 architecture implements the AES algorithm by means of an iterative function called a 
round for both encryption and the inverse operation, decryption. 

The top-level encryption and decryption procedures Encrypt( ) and Decrypt( ) set up the rounds and 
invoke the functions that perfonn them. Each of the procedures takes two 128-bit binary arguments: 

• input data — a 16-byte block of text stored in a source 128-bit XMM register 

• cipher key — a 16-, 24-, or 32-byte cipher key stored in either a second 128-bit XMM register or 
128-bit memory location 

A.7.1 The Encrypt() and Decrypt() Procedures 

TextBlock Encrypt(TextBlock in, CipherKey key, nat Nk) { 
return Cipher(in, ExpandKey(key, Nk), Nk); 

} 


TextBlock Decrypt(TextBlock in, CipherKey key, nat Nk) { 
return InvCipher(in, ExpandKey(key, Nk) , Nk) ; 
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} 

The array types TextBlock and CipherKey are introduced to accommodate the text and key 
parameters. The 16-, 24-, or 32-byte cipher keys correspond to AES-128, AES-192, or AES-256 key 
sizes. The cipher key is logically partitioned into N k = 4, 6, or 8 AES 32-bit words. N k is passed as a 
parameter to detennine the AES version to be executed, and the number of rounds to be performed. 

Both the Encrypt( ) and Decrypt( ) procedures invoke the ExpandKey( ) function to expand the 
cipher key for use in round key generation. When key expansion is complete, either the Cipher( ) or 
InvCipher( ) functions are invoked. 

The Cipher( ) and InvCipher( ) functions are the key components of the encryption and decryption 
process. See Section A.8, “The Cipher Function” and Section A.9, “The InvCipher Function” for 
detailed information. 

A.7.2 Round Sequences and Key Expansion 

Encryption and decryption are perfonned in a sequence of rounds indexed by 0, ..., N n where N r is 
determined by the number N k of GF words in the cipher key. A key matrix called a round key is 
generated for each round. The number of GF words required to form N r + 1 round keys is equal to , 

4 (N r + 1). Table A-3 shows the relationship between cipher key length, round sequence length, and 
round key length. 


Table A-3. Cipher Key, Round Sequence, and Round Key Length 


N k 

N r 

4(N r + 1) 

4 

10 

44 

6 

12 

52 

8 

14 

60 


Expanded keys are generated from the cipher key by the ExpandKey( ) function, where the array type 
ExpandedKey is defined to accommodate 60 words (the maximum required) corresponding to N k = 8. 

The ExpandKey() Function 

ExpandedKey ExpandKey(CipherKey key, nat Nk) { 
assert ((Nk == 4) || (Nk == 6) || (Nk == 8)); 

nat Nr = Nk + 6; 

ExpandedKey w; 

// Copy key into first Nk rows of w: 
for (nat i=0; i<Nk; i++) { 

for (nat j=0; j<4; j++) { 

w[i][j] = key[4*i+j]; 

} 

} 
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// Write next row of w: 

for (nat i=Nk; i<4*(Nr+l); i++) { 

// Encode preceding row: 

GFWord tmp = w[i-l]; 

if (mod(i, Nk) == 0) { 

tmp = SubWord(RotWord(tmp)); 
tmp[0] = tmp[0] A RCON[i/Nk]; 

} 

else if ((Nk == 8) && (mod(i, Nk) == 4)) { 

tmp = SubWord(tmp); 

} 

// XOR tmp with w[i-Nk]: 

for (nat j=0; j<4; j++) { 

w[i] [j] = w[i-Nk] [ j] A tmp[j]; 

} 

} 

return w; 


ExpandKey( ) begins by copying the input cipher key into the first N k GF words of the expanded key 
w. The remaining 4 (N r + 1) - N /( GF words are computed iteratively. For each i > N k , >v[/] is derived 
from the two GF words vv[/ - 1] and w[i - N k ]. In most cases, >v[/] is simply the sum w[i - 1] © w[i - 
N k ], There are two exceptions: 

• If i is divisible by N k , then before adding it to w[i - N k ], w[i - 1] is first rotated by one position to 
the left by RotWord( ), then transformed by the substitution SubWord( ), and an element of the 
array RCON is added to it. 

RCON[l 1] = {00h, Olh, 02h, 04h, 08h, lOh, 20h, 40h, 80h, lBh, 36h} 

• In the case N k = 8, if i is divisible by 4 but not 8, then w[i - 1] is transformed by the substitution 

SubWord(). 

The i th round key K i comprises the four GF words w[4i], ..., w[4i + 3]. More precisely, let W { be the 
matrix 

W= {w[4 /], w[4i + 1 ], iv[4/ + 2 ], w[4i + 3]} 

Then K i = Wf, the transpose of IV r Thus, the entries of the array vv are the columns of the round keys. 


A.8 The Cipher Function 

This function performs encryption. It converts the input text to matrix form, generates the round key 
from the expanded key matrix, and iterates through the transfonning functions the number of times 
determined by encryption key size to produce a 128-bit binary cipher matrix. As a final step, it 
converts the matrix to an output text block. 
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TextBlock Cipher(TextBlock in, ExpandedKey w, nat Nk) { 
assert((Nk == 4) | | (Nk == 6) | | (Nk == 8) ) ; 

nat Nr = Nk + 6; 

GFMatrix state = Text2Matrix(in) ; 
state = AddRoundKey(state, w, 0); 
for (nat round=l; round<Nr; round++) { 
state = SubBytes(state) ; 
state = ShiftRows(state) ; 
state = MixColumns(state) ; 
state = AddRoundKey(state, w, round); 

} 

state = SubBytes(state); 

state = ShiftRows(state) ; 

state = AddRoundKey(state, w. Nr); 

return Matrix2Text(state) ; 


A.8.1 Text to Matrix Conversion 

Prior to processing, the input text block must be converted to matrix form. The Text2Matrix( ) 
function stores a TextBlock in a GFMatrix in column-major order as follows. 

GFMatrix Text2Matrix(TextBlock A) { 

GFMatrix result; 
for (nat j=0; j<4; j++) { 

for (nat i=0; i<4; i++) { 

result [i] [j] = A[4*j+i]; 

} 

} 

return result; 


A.8.2 Cipher Transformations 

The Cipher function employs the following transformations. 

SubBytes( ) — Applies a non-linear substitution table (SBox) to each byte of the state. 

SubWord() — Uses a non-linear substitution table (SBox) to produce a four-byte AES output 
word from the four bytes of an AES input word. 

ShiftRows( ) — Cyclically shifts the last three rows of the state by various offsets. 

RotWord( ) — Rotates an AES (4-byte) word to the right. 

MixColumns( ) — Mixes data in all the state columns independently to produce new columns. 

AddRoundKey( ) — Extracts a 128-bit round key from the expanded key matrix and adds it to the 
128-bit state using an XOR operation. 

Inverses of SubBytes( ), SubWord( ), ShiftRows( ) and MixColumns( ) are used in decryption. See 
Section A.9, “The InvCipher Function” for more information. 
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SubBytes() Function 

Performs a byte substitution operation using the invertible substitution table (SBox) to convert input 
text to an intennediate encryption state. 

GFMatrix SubBytes(GFMatrix M) { 

GFMatrix result; 
for (nat i=0; i<4; i++) { 

result [i] = SubWord(M[i]); 

} 

return result; 


SubWord() Function 

Applies SubBytes to each element of a vector or a matrix: 

GFWord SubWord(GFWord x) { 

GFWord result; 
for (nat i=0; i<4; i++) { 

result [i] = SubByte(x[i]); 

} 

return result; 


ShiftRows() Function 

Cyclically shifts the last three rows of the state by various offsets. 

GFMatrix ShiftRows(GFMatrix M) { 

GFMatrix result; 

for (nat i=0; i<4; i++) { 

result [i] = RotateLeft(M[i], -i); 

} 

return result; 

RotWord() Function 

Performs byte-wise cyclic pennutation of a 32-bit AES word. 

GFWord RotWord(GFWord x) 

{ return RotateLeft(x, 1); } 


MixColumns() Function 

Performs a byte-oriented column-by-column matrix multiplication 

M —> C © M, where C is the predefined fixed matrix 

2 3 11" 

12 3 1 

L 112 3 

3 112 


988 



26568 — Rev. 3.23—February 2019 


AMDS 

AMD64 Technology 


The function is implemented as follows: 

GFMatrix MixColumns(GFMatrix M) { 
GFMatrix C = { 

{0x02,0x03,0x01,0x01}, 

{0x01,0x02,0x03,0x01}, 

{0x01,0x01,0x02,0x03}, 

{0x03,0x01,0x01,0x02} 

} ; 

return GFMatrixMul(C, M); 


AddRoundKey() Function 

Extracts the round key from the expanded key and adds it to the state using a bitwise XOR operation. 

GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round) { 

GFMatrix result = state; 
for (nat i=0; i<4; i++) { 

for (nat j=0; j<4; j++) { 

result [i] [j] = result[i] [j] A w[4*round+j] [i]; 

} 

} 

return result; 


A.8.3 Matrix to Text Conversion 

After processing, the output matrix must be converted to a text block. The Matrix2Text( ) function 
converts a GFMatrix in column-major order to a TextBlock as follows. 

TextBlock Matrix2Text(GFMatrix M) { 

TextBlock result; 
for (nat j=0; j<4; j++) { 

for (nat i=0; i<4; i++) { 

result[4*j+i] = M[i][j]; 

} 

} 

return result; 


A.9 The InvCipher Function 

This function performs decryption. It iterates through the round function the number of times 
determined by encryption key size and produces a 128-bit block of text as output. 

TextBlock InvCipher(TextBlock in, ExpandedKey w, nat Nk) { 
assert((Nk == 4) || (Nk == 6) || (Nk == 8)); 

nat Nr = Nk + 6; 

GFMatrix state = Text2Matrix(in); 
state = AddRoundKey(state, w. Nr); 
for (nat round=Nr-l; round>0; round—) { 

state = InvShiftRows(state) ; 
state = InvSubBytes(state); 
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state = AddRoundKey(state, w, round); 
state = InvMixColumns(state) ; 

} 

state = InvShiftRows(state); 
state = InvSubBytes(state) ; 
state = AddRoundKey(state, w, 0); 
return Matrix2Text(state) ; 


A.9.1 Text to Matrix Conversion 

Prior to processing, the input text block must be converted to matrix form. The Text2Matrix( ) 
function stores a TextBlock in a GFMatrix in column-major order as follows. 

GFMatrix Text2Matrix(TextBlock A) { 

GFMatrix result; 
for (nat j=0; j<4; j++) { 

for (nat i=0; i<4; i++) { 

result [ i][ j] =A[4*j+i]; 

} 

} 

return result; 


A.9.2 InvCypher Transformations 

The following functions are used in decryption: 

InvShiftRows( ) — The inverse of ShiftRows( ). 

InvSubBytes( ) — The inverse of SubBytes( ). 

InvSubWord( ) — The inverse of SubWord( ). 

InvMixColumns( ) — The inverse of MixColumns( ). 

AddRoundKey( ) — Is its own inverse. 

Decryption is the inverse of encryption and is accomplished by means of the inverses of the, 
SubBytes( ), SubWord( ), ShiftRows( ) and MixColumns( ) transfonnations used in encryption. 

SubWord( ), SubBytes( ), and ShiftRows( ) are injective. This is also the case with MixColumns( ). 
A simple computation shows that C is invertible with 

" E B D 9" 

r -,= 9 E B D 

L D 9 E B 

B D 9 E 

lnvShiftRows() Function 

The inverse of ShiftRows( ). 

GFMatrix InvShiftRows(GFMatrix M) { 

GFMatrix result; 
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for (nat i=0; i<4; i++) { 

result [i] = RotateLeft(M[i], -i); 

} 

return result; 

lnvSubBytes() Function 

The inverse of SubBvtes {). 

GFMatrix InvSubBytes(GFMatrix M) { 
GFMatrix result; 
for (nat i=0; i<4; i++) { 

result [i] = InvSubWord(M[i]); 

} 

return result; 


lnvSubWord() Function 

The inverse of SubWord( ), InvSubBytes( ) applied to each element of a vector or a matrix. 

GFWord InvSubWord(GFWord x) { 

GFWord result; 

for (nat i=0; i<4; i++) { 

result [i] = InvSubByte(x[i]); 

} 

return result; 


lnvMixColumns() Function 

The inverse of the MixColumns( ) function. Multiplies by the inverse of the predefined fixed matrix, 
C, C~‘, as discussed previously. 

GFMatrix InvMixColumns(GFMatrix M) { 

GFMatrix D = { 

{OxOe,0x0b,OxOd,0x09}, 

{0x09,OxOe,0x0b,OxOd}, 

{OxOd,0x09,OxOe,0x0b}, 

{0x0b,OxOd,0x09,OxOe} 

} ; 

return GFMatrixMul(D, M) ; 


AddRoundKey() Function 

Extracts the round key from the expanded key and adds it to the state using a bitwise XOR operation. 

GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round) { 

GFMatrix result = state; 
for (nat i=0; i<4; i++) { 

for (nat j=0; j<4; j++) { 

result[i] [j] = resultfi] [j] A w[4*round+j] [i]; 

} 

} 

return result; 
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A.9.3 Matrix to Text Conversion 

After processing, the output matrix must be converted to a text block. The Matrix2Text( ) function 
converts a GFMatrix in column-major order to a TextBlock as follows. 

TextBlock Matrix2Text(GFMatrix M) { 

TextBlock result; 
for (nat j=0; j<4; j++) { 

for (nat i=0; i<4; i++) { 

result [4*j + i] = M[i][j]; 

} 

} 

return result; 


A.10 An Alternative Decryption Procedure 

This section outlines an alternative decrypting procedure, 

TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk): 

TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk) { 

return EqlnvCipher(in, MixRoundKeys(ExpandKey(key, Nk), Nk), Nk); 

} 


The procedure is based on a variation of InvCipher, 

TextBlock EqlnvCipher(TextBlock in, ExpandedKey w, nat Nk): 

TextBlock EqlnvCipher(TextBlock in, ExpandedKey dw, nat Nk) { 
assert((Nk == 4) || (Nk == 6) || (Nk == 8)); 

nat Nr = Nk + 6; 

GFMatrix state = Text2Matrix(in); 
state = AddRoundKey(state, dw. Nr); 
for (nat round=Nr-l; round>0; round—) { 

state = InvSubBytes(state) ; 
state = InvShiftRows(state) ; 
state = InvMixColumns(state) ; 
state = AddRoundKey(state, dw, round); 

} 

state = InvSubBytes(state) ; 
state = InvShiftRows(state) ; 
state = AddRoundKey(state, dw, 0); 
return Matrix2Text(state) ; 

} 

The variant structure more closely resembles that of Cipher. This requires a modification of the 
expanded key generated by ExpandKey, 

ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk): 
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ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk) { 
assert((Nk == 4) | | (Nk == 6) | | (Nk == 8) ) ; 

nat Nr = Nk + 6; 

ExpandedKey result; 

GFMatrix roundKey; 

for (nat round=0; round<Nr+l; round+t) { 
for (nat i=0; i<4; i++) { 

roundKey[i] = w[4*round+i]; 

} 

if ((round > 0) && (round < Nr)) { 
roundKey = InvMixRows(roundKey) ; 

} 

for (nat i=0; i<4; i++) { 

result[4*round+i] = roundKey[i]; 

} 

} 

return result; 

} 

The transformation MixRoundKeys leaves K0 and K Nr unchanged, but for / = 1,... ,N r - 1, it replaces 
Wj with the matrix product W i Q Q, where 
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The effect of this is to replace /f, with 

(W< © Q) 1 = Q l Q Wl = C~ l 0 Ki = C~\Ki) 


for i = ],...,N r — 1. 

The equivalence of EqDecrypt and Decrypt follows from two properties of the basic operations: 
C is a linear transformation and therefore, so is C ~ x ; 

E and R commute, and hence so do X 1 and R ', for if 
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£(ft(S)) = 


(t{soo) 

^(sil) 

(t(s 22 ) 

ff (« 33 ) 


fl'(soi) 

<t(«23) 

^(«30) 


<t(s 0 2) 

<7(si3) 

<?(S2o) 

^(«3l) 


<7(^03) 

<7(a-io) 

<r(«2l) 

^(-^32). 


= ^(S(S)). 


Now let X” and Y” be the initial and final states of an execution of EqDecrypt and let S'” - be the state 
following round i . Suppose X” = X’. Appealing to the definitions of EqDecrypt and EqlnvCipher, 
we have 


S" r = X" ® K Nr = X' ® K Nr = S' Nr , 
and for i = N r - 1,..., 1, by induction, 

s'' = 

= C- l {Y,- l {TL-\S'! +l ))®K Nr ) 

= c- l {n-\Y.-\s' i+l ))@K Nr ) 

= 5;.! 


Finally, 


Y" = s;; = K-'lZ-^SZfiGKo 
= S- J (7 Z~ 1 (S , !))®K 0 

= s' 0 = r. 


A.11 Computation of GFInv with Euclidean Greatest 
Common Divisor 

Note that the operations perfonned by GFInv( ) are in the ring TL~^X\ rather than the quotient field GF. 

The initial values of the variables ,\'| and xj are the inputs x and lib, the latter representing the 
polynomial p(X). The variables uj and a 2 are initialized to 1 and 0. 
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On each iteration of the loop, a multiple of the lesser of xl and x2 is added to the other. If xl < x2, then 
the values of x2 and a2 are adjusted as follows: 

x 2 —> x 2 © 2 s O Xj 

<?2 —^ a 2 © 2 s © nj 

where 5 is the difference in the exponents ( i.e ., degrees) ofx ( andx 2 . In the remaining case, x 1 and a 1 
are similarly adjusted. This step is repeated until either x 1 = 0 or x 2 = 0. 

We make the following observations: 

• On each iteration, the value added to xi has the same exponent as x„ and hence the sum has lesser 
exponent. Therefore, termination is guaranteed. 

• Since p(X) is irreducible and x is of smaller degree than p(X), the initial values of x 1 and x 2 have no 
non-trivial common factor. This property is clearly preserved by each step. 

• Initially, 

xj©nj ©x=x©x=0 
and 

x 2 © a 2 Ox = lib © 0 = lib 

are both divisible by lib. This property is also invariant, since, for example, the above assignments 
result in 

x 2 © a 2 O x —» (x 2 © 2 s © xj) © ( a 2 © 2 s © nj) © x = (x 2 © a2 © x) © 2 s © (xj © © x). 

Now suppose that the loop tenninates with x 2 = 0. Then x ( has no non-trivial factor and, hence, Xj = 1. 
Thus, 1 © ai © x is divisible by lib. Since the final result y is derived by reducing ci\ modulo 1 lb, it 
follows that 1 ©y © x is also divisible by 1 lb and, hence, in the quotient field GF, 1 +y © x = 0, 
which implies y © x = 1. 

The computation of the multiplicative inverse utilizing Euclid’s algorithm is as follows: 
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// Computation of multiplicative inverse based on Euclid's algorithm: 

GF256 GFInv(GF256 x) { 
if (x == 0) { 

return 0; 

} 

// Initialization: 
nat xl = x; 

nat x2 = OxllB; // the irreducible polynomial p(X) 
nat al = 1; 
nat a2 = 0; 

nat shift; // difference in exponents 
while ((xl != 0) && (x2!= 0)) { 


// Termination is guaranteed, since either xl or x2 decreases on each iteration. 
// We have the following loop invariants, viewing natural numbers as elements of 
// the polynomial ring Z2[X]: 

// (1) xl and x2 have no common divisor other than 1. 

// (2) xl A GFMul(al, x) and x2 A GFMul(a2, x) are both divisible by p(X). 


if (xl <= x2) { 

shift = expo(x2) 
x2 = x2 A (xl << 
a2 = a2 A (al << 

} 

else { 

shift = expo(xl) 
xl = xl A (x2 << 
al = al A (a2 << 

} 

} 

nat y; 


- expo(xl); 
shift) ; 
shift) ; 


- expo(x2); 
shift) ; 
shift) ; 


// Since either xl or x2 is 0, it follows from (1) above that the other is 1. 


if (xl == 1) { // x2 == 0 

y = al; 

} 

else if (x2 == 1) { // xl == 0 

y = a2; 

} 

else { 

assert(false); 

} 


// Now it follows from (2) that GFMul(y, x) A 1 is divisible by Oxllb. 
// We need only reduce y modulo Oxllb: 

nat e = expo(y); 
while (e >= 8) { 

y = y A (OxllB << (e - 8)); 
e = expo(y); 

} 

return y; 


996 



AMDS 

26568 — Rev. 3.23—February 2019 AMD64 Technology 


Index 


Numeric 


128-bit media instruction. xxix 

16-bit mode. xxix 

256-bit media instruction. xxix 

32-bit mode. xxix 

64-bit media instructions. xxix 

64-bit mode. xxix 

A 

absolute displacement. xxx 

ADDPD. 23 

ADDPS. 25 

Address space identifier. xxx 

Address space identifier (AS1D). xxx 

ADDSD. 27 

ADDSS. 29 

ADDSUBPD. 31 

ADDSUBPS. 33 

Advanced Encryption Standard (AES). xxx, 973 

data structures. 974 

decryption. 976, 984, 992 

encryption. 976, 984 

Euclidean common divisor. 994 

InvSbox. 979 

operations. 978 

Sbox. 979 

AESDEC. 35 

AESDECLAST. 37 

AESENC. 39 

AESENCLAST. 41 

AESIMC. 43 

AESKEYGENASSIST. 45 

ANDNPD. 47 

ANDNPS. 49 

ANDPD. 51 

ANDPS. 53 

ASID. xxx 

AVX. xxx 

B 

biased exponent. xxx 

BLENDPD. 55 

BLENDPS. 57 

BLENDVPD. 59 

BLEND VPS. 61 

byte. xxx 


C 


clear. xxx 

cleared. xxx 

CMPPD. 63 

CMPPS. 67 

CMPSD. 71 

CMPSS. 75 

COMISD. 79 

COMISS. 82 

commit. xxx 

compatibility mode. xxx 

Current privilege level (CPL). xxx 

CVTDQ2PD. 84 

CVTDQ2PS. 86 

CVTPD2DQ. 88 

CVTPD2PS. 90 

CVTPS2DQ. 92 

CVTPS2PD. 94 

CVTSD2SI. 96 

CVTSD2SS. 99 

CVTSI2SD. 101 

CVTSI2SS. 104 

CVTSS2SD. 107 

CVTSS2SI. 109 

CVTTPD2DQ. 112 

CVTTPS2DQ. 115 

CVTTSD2SI. 117 

CVTTSS2SI. 120 

D 

Definitions. xxix 

direct referencing. xxx 

displacement. xxx 

DIVPD. 123 

DIVPS. 125 

DIVSD. 127 

DIVSS. 129 

double quadword. xxxi 

doubleword. xxxi 

DPPD. 131 

DPPS. 134 

E 

effective address size. xxxi 

effective operand size. xxxi 

element. xxxi 

endian order. xxxix 


997 






















































































AMD J 

AMD64 Technology 


26568 — Rev. 3.23—February 2019 


exception. xxxi 

exponent. xxx 

extended SSE. xxxi 

extended-register prefix. xxxiv 

EXTRQ. 139 


F 


flush. xxxi 

FMA. xxxi 

FMA4. xxxi 

four-operand instruction. 6 


G 


General notation. xxviii 

Global descriptor table (GDT). xxxi 

Global interrupt flag (GIF). xxxii 

H 


HADDPD. 141 

HADDPS. 143 

HSUBPD. 146 

HSUBPS. 149 

I 

IGN. xxxii 

immediate operands. 4 

indirect. xxxii 

INSERTPS. 152 

INSERTQ. 154 

instructions 

AES. xxx 

Interrupt descriptor table (IDT). xxxii 

Interrupt redirection bitmap (IRB). xxxii 

Interrupt stack table (1ST). xxxii 

Interrupt vector table (IVT). xxxii 

L 


LDDQU. 156 

LDMXCSR. 158 

least significant byte. xxxiii 

least-significant bit. xxxiii 

legacy mode. xxxii 

legacy x86. xxxii 

little endian. xxxix 

Local descriptor table (LDT). xxxii 

long mode. xxxii 

LSB. xxxiii 

lsb. xxxiii 


M 


main memory 


xxxiii 


mask. xxxiii 

MASKMOVDQU. 160 

MAXPD. 162 

MAXPS. 165 

MAXSD. 168 

MAXSS. 170 

memory. xxxiii 

MIN PD. 172 

MIN PS. 175 

MINSD. 178 

MINSS. 180 

modes 

32-bit. xxix 

64-bit. xxix 

compatibility. xxx 

legacy. xxxii 

long. xxxii 

protected. xxxiv 

real. xxxiv 

virtual-8086. xxxvi 

most significant bit. xxxiii 

most significant byte. xxxiii 

MOVAPD. 182 

MOVAPS. 184 

MOVD. 186 

MOVDDUP. 188 

MOVDQA. 190 

MOVDQU. 192 

MOVHLPS. 194 

MOVHPD. 196 

MOVHPS. 198 

MOVLHPS. 200 

MOVLPD. 202 

MOVLPS. 204 

MOVMSKPD. 206 

MOVMSKPS. 208 

MOVNTDQ. 210 

MOVNTDQA. 212 

MOVNTPD. 214 

MOVNTPS. 216 

MOVNTSD. 218 

MOVNTSS. 220 

MOVQ. 222 

MOVSD. 224 

MOVSHDUP. 226 

MOVSLDUP. 228 

MOVSS. 230 

MOVUPD. 232 

MOVUPS. 234 

MPSADBW. 236 

MSB. xxxiii 

msb. xxxiii 


998 




























































































AMDS 

26568 — Rev. 3.23—February 2019 AMD64 Technology 


MULPD. 241 

MULPS. 243 

MULSD. 245 

MULSS. 247 

Must be zero (MBZ). xxxiii 

N 

Notation 

conventions. xxviii 

register. xxxvi 

o 

octword. xxxiii 

offset. xxxiii 

operands 

immediate. 4 

ORPD. 249 

ORPS. 251 

overflow. xxxiii 

P 

PABSB. 253 

PABSD. 255 

PABSW. 257 

packed. xxxiii 

PACKSSDW. 259 

PACKSSWB. 261 

PACKUSDW. 263 

PACKUSWB. 265 

PADDB. 267 

PADDD. 269 

PADDQ. 271 

PADDSB. 273 

PADDSW. 275 

PADDUSB. 277 

PADDUSW. 279 

PADDW. 281 

PAL1GNR. 283 

PAND. 285 

PANDN. 287 

PAVGB. 289 

PAVGW. 291 

PBLENDVB. 293 

PBLENDW. 295 

PCLMULQDQ. 297 

PCMPEQB. 299 

PCMPEQD. 301 

PCMPEQQ. 303 

PCMPEQW. 305 

PCMPESTR1. 307 

PCMPESTRM. 310 

PCMPGTB. 313 


PCMPGTD. 315 

PCMPGTQ. 317 

PCMPGTW. 319 

PCMPISTRI. 321 

PCMPISTRM. 324 

PEXTRB. 327 

PEXTRD. 329 

PEXTRQ. 331 

PEXTRW. 333 

PHADDD. 335 

PHADDSW. 337 

PHADDUBD. 768 

PHADDW. 340 

PHMINPOSUW. 343 

PHSUBD. 345 

PHSUBSW. 347 

PHSUBW. 350 

Physical address extension (PAE). xxxiii 

physical memory. xxxiv 

PINSRB. 353 

PINSRD. 356 

PINSRQ. 358 

PINSRW. 360 

PMADDUBSW. 362 

PMADDWD. 365 

PMAXSB. 367 

PMAXSD. 369 

PMAXSW. 371 

PMAXUB. 373 

PMAXUD. 375 

PMAXUW. 377 

PMINSB. 379 

PMINSD. 381 

PMINSW. 383 

PMINUB. 385 

PMINUD. 387 

PMINUW. 389 

PMOVMSKB. 391 

PMOVSXBD. 393 

PMOVSXBQ. 395 

PMOVSXBW. 397 

PMOVSXDQ. 399 

PMOVSXWD. 401 

PMOVSXWQ. 403 

PMOVZXBD. 405 

PMOVZXBQ. 407 

PMOVZXBW. 409 

PMOVZXDQ. 411 

PMOVZXWD. 413 

PMOVZXWQ. 415 

PMULDQ. 417 
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PMULHRSW. 419 

PMULHUW. 421 

PMULHW. 423 

PMULLD. 425 

PMULLW. 427 

PMULUDQ. 429 

POR. 431 

probe. xxxiv 

protected mode. xxxiv 

PSADBW. 433 

PSHUFB. 435 

PSHUFD. 437 

PSHUFHW. 440 

PSHUFLW. 443 

PSIGNB. 446 

PSIGND. 448 

PSIGNW. 450 

PSLLD. 452 

PSLLDQ. 455 

PSLLQ. 457 

PSLLW. 460 

PSRAD. 463 

PSRAW. 466 

PSRLD. 469 

PSRLDQ. 472 

PSRLQ. 474 

PSRLW. 477 

PSUBB. 480 

PSUBD. 482 

PSUBQ. 484 

PSUBSB. 486 

PSUBSW. 488 

PSUBUSB. 490 

PSUBUSW. 492 

PSUBW. 494 

PTEST. 496 

PUNPCKHBW. 498 

PUNPCKHDQ. 501 

PUNPCKHQDQ. 504 

PUNPCKHWD. 507 

PUNPCKLBW. 510 

PUNPCKLDQ. 513 

PUNPCKLQDQ. 516 

PUNPCKLWD. 519 

PXOR. 522 

Q 

quadword. xxxiv 

R 

RCPPS. 524 


RCPSS. 526 

Read as zero (RAZ). xxxiv 

real address mode. See real mode 

real mode. xxxiv 

Register extension prefix (REX). xxxiv 

Register notation. xxxvi 

relative. xxxiv 

Relative instruction pointer (RIP). xxxiv 

reserved. xxxiv 

revision history. xxiii 

RIP-relative addressing. xxxiv 

Rip-relative addressing. xxxiv 

ROUNDPD. 528 

ROUNDSD. 534 

ROUNDSS. 537 

ROUNDTPS. 531 

RSQRTPS. 540 

RSQRTSS. 542 

s 

SBZ. xxxiv 

scalar. xxxv 

set. xxxv 

SHUFPD. 558 

SHUFPS. 561 

Single instruction multiple data (SIMD). xxxv 

SQRTPD. 564 

SQRTPS. 566 

SQRTSD. 568 

SQRTSS. 570 

SSE. xxxv 

SSE Instructions 

legacy. xxxii 

SSE instructions 

AVX. xxx 

SSE1. xxxv 

SSE2. xxxv 

SSE3. xxxv 

SSE4.1. xxxv 

SSE4.2. xxxv 

SSE4A. xxxv 

SSSE3. xxxv 

sticky bit. xxxv 

STMXCSR. 572 

Streaming SIMD Extensions. xxxv 

string compare instructions. 10 

string comparison. 10 

SUBPD. 574 

SUBPS. 576 

SUBSD. 578 

SUBSS. 580 
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Task state segment (TSS). xxxv 

Terminology. xxix 

three-operand instruction. 5 

two-operand instruction. 4 

u 

UCOMISD. 582 

UCOMISS. 584 

underflow. xxxvi 

UNPCKHPD. 586 

UNPCKHPS. 588 

UNPCKLPD. 590 

UNPCKLPS. 592 

V 

VADDPD. 23 

VADDPS. 25 

VADDSD. 27 

VADDSUBPD. 31 

VADDSUBPS. 33 

VADSS. 29 

VAESDEC. 35 

VAESDECLAST. 37 

VAESENC. 39 

VAESENCLAST. 41 

VAESIMC. 43 

VAESKEYGENASSIST. 45 

VANDNPD. 47 

VANDNPS. 49 

VANDPD. 51 

VANDPS. 53 

VBLENDPD. 55 

VBLENDPS. 57 

VBLENDVPD. 59 

VBLENDVPS. 61 

VBROADCASTF128 . 594 

VBROADCASTI128 . 596 

VBROADCASTSD. 598 

VBROADCASTSS. 600 

VCMPPD. 63 

VCMPPS. 67 

VCMPSD. 71 

VCMPSS. 75 

VCOMISD. 79 

VCOMISS. 82 

VCVTDQ2PD. 84 

VCVTDQ2PS. 86 

VCVTPD2DQ. 88 

VCVTPD2PS. 90 

VCVTPH2PS. 602 


VCVTPS2DQ. 92 

VCVTPS2PD. 94 

VCVTPS2PH. 605 

VCVTSD2SI. 96 

VCVTSD2SS. 99 

VCVTSI2SD. 101 

VCVTSI2SS. 104 

VCVTSS2SD. 107 

VCVTSS2SI. 109 

VCVTTPD2DQ. 112 

VCVTTPS2DQ. 115 

VCVTTSD2SI. 117 

VCVTTSS2SI. 120 

VDIVPD. 123 

VDIVPS. 125 

VDIVSD. 127 

VDIVSS. 129 

VDPPD. 131 

VDPPS. 134 

vector. xxxvi 

VEX prefix. xxxvi 

VEXTRACT128. 609 

VEXTRACTI128. 611 

VFMADD132PD. 613 

VFMADD132PS. 616 

VFMADD132SD. 619 

VFMADD132SS. 622 

VFMADD213PD. 613 

VFMADD213PS. 616 

VFMADD213SD. 619 

VFMADD213SS. 622 

VFMADD231PD. 613 

VFMADD231PS. 616 

VFMADD231SD. 619 

VFMADD231SS. 622 

VFMADDPD. 613 

VFMADDPS. 616 

VFMADDSD. 619 

VFMADDSS. 622 

VFMADDSUB132PD. 625 

VFMADDSUB132PS. 628 

VFMADDSUB213PD. 625 

VFMADDSUB213PS. 628 

VFMADDSUB231PD. 625 

VFMADDSUB23 IPS. 628 

VFMADDSUBPD. 625 

VFMADDSUBPS. 628 

VFMSUB132PD. 637 

VFMSUB132PS. 640 

VFMSUB132SD. 643 

VFMSUB132SS. 646 
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VFMSUB213PD. 637 

VFMSUB213PS. 640 

VFMSUB213SD. 643 

VFMSUB213SS. 646 

VFMSUB231PD. 637 

VFMSUB231PS. 640 

VFMSUB231SD. 643 

VFMSUB231SS. 646 

VFMSUBADD132PD. 631 

VFMSUBADD132PS. 634 

VFMSUBADD213PD. 631 

VFMSUBADD213PS. 634 

VFMSUBADD231PD. 631 

VFMSUBADD23 IPS. 634 

VFMSUBADDPD. 631 

VFMSUBADDPS. 634 

VFMSUBPD. 637 

VFMSUBPS. 640 

VFMSUBSD. 643 

VFMSUBSS. 646 

VFNMADD132PD. 649 

VFNMADD132PS. 652 

VFNMADD132SS. 658 

VFNMADD213PD. 649 

VFNMADD213PS. 652 

VFNMADD213SS. 658 

VFNMADD231PD. 649 

VFNMADD23 IPS. 652 

VFNMADD231SS. 658 

VFNMADDPD. 649 

VFNMADDPS. 652 

VFNMADDSD. 655 

VFNMADDSS. 658 

VFNMSUB132PD. 661 

VFNMSUB132PS. 664 

VFNMSUB132SD. 667 

VFNMSUB132SS. 670 

VFNMSUB213PD. 661 

VFNMSUB213PS. 664 

VFNMSUB213SD. 667 

VFNMSUB213SS. 670 

VFNMSUB231PD. 661 

VFNMSUB231PS. 664 

VFNMSUB231SD. 667 

VFNMSUB231SS. 670 

VFNMSUBPD. 661 

VFNMSUBPS. 664 

VFNMSUBSD. 667 

VFNMSUBSS. 670 

VFRCZPD. 673 

VFRCZPS. 675 


VFRCZSD. 677 

VFRCZSS. 679 

VGATHERDPD. 681 

VGATHERDPS. 683 

VGATHERQPD. 685 

VGATHERQPS. 687 

VHADDPD. 141 

VHADDPS. 143 

VHSUBPD. 146 

VHSUBPS. 149 

VINSERTF128 . 689 

VINSERTI128. 691 

VINSERTPS. 152 

Virtual machine control block (VMCB). xxxvi 

Virtual machine monitor (VMM). xxxvi 

virtual-8086 mode. xxxvi 

VLDDQU. 156 

VLDMXCSR. 158 

VMASKMOVDQU. 160 

VMASKMOVPD. 693 

VMASKMOVPS. 695 

VMAXPD. 162 

VMAXPS. 165 

VMAXSD. 168 

VMAXSS. 170 

VMINPD. 172 

VMINPS. 175 

VMINSD. 178 

VMINSS. 180 

VMOVAPS. 184 

VMOVD. 186 

VMOVDDUP. 188 

VMOVDQA. 190 

VMOVDQU. 192 

VMOVHLPS. 194 

VMOVHPD. 196 

VMOVHPS. 198 

VMOVLHPS. 200 

VMOVLPD. 202 

VMOVLPS. 204 

VMOVMSKPD. 206 

VMOVMSKPS. 208 

VMOVNTDQ. 210 

VMOVNTDQA. 212 

VMOVNTPD. 214 

VMOVNTPS. 216 

VMOVQ. 222 

VMOVSD. 224 

VMOVSHDUP. 226 

VMOVSLDUP. 228 

VMOVSS. 230 
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VMOVUPD. 232 

VMOVUPS. 234 

VMPSADBW. 236 

VMULPD. 241 

VMULPS. 243 

VMULSD. 245 

VMULSS. 247 

VORPD. 249 

VORPS. 251 

VPABSB. 253 

VPABSD. 255 

VPABSW. 257 

VPACKSSDW. 259 

VPACKSSWB. 261 

VPACKUSDW. 263 

VPACKUSWB. 265 

VPADDD. 269 

VPADDQ. 271 

VPADDSB. 273 

VPADDSW. 275 

VPADDUSB. 277 

VPADDUSW. 279 

VPADDW. 281 

VPALIGNR. 283 

VPAND. 285 

VPANDN. 287 

VPAVGB. 289 

VPAVGW. 291 

VPBLENDD. 697 

VPBLENDVB. 293 

VPBLENDW. 295 

VPBROADCASTB. 699 

VPBROADCASTD. 701 

VPBROADCASTQ. 703 

VPBROADCASTW. 705 

VPCLMULQDQ. 297 

VPCMOV. 707 

VPCMPEQB. 299 

VPCMPEQD. 301 

VPCMPEQQ. 303 

VPCMPEQW. 305 

VPCMPESTRI. 307 

VPCMPESTRM. 310 

VPCMPGTB. 313 

VPCMPGTD. 315 

VPCMPGTQ. 317 

VPCMPGTW. 319 

VPCMPISTRI. 321 

VPCMPISTRM. 324 

VPCOMB. 709 

VPCOMD. 711 


VPCOMQ. 713 

VPCOMUB. 715 

VPCOMUD. 717 

VPCOMUQ. 719 

VPCOMUW. 721 

VPCOMW. 723 

VPERM2F128. 725 

VPERM2I128. 727 

VPERMD. 729 

VPERMIL2PD. 731 

VPERMIL2PS. 735 

VPERMILPD. 739 

VPERMILPS. 742 

VPERMPD. 746 

VPERMPS. 748 

VPERMQ. 750 

VPEXTRB. 327 

VPEXTRD. 329 

VPEXTRQ. 331 

VPEXTRW. 333 

VPGATHERDD. 752 

VPGATHERDQ. 754 

VPGATHERQD. 756 

VPGATHERQQ. 758 

VPHADDBD. 760 

VPHADDBQ. 762 

VPHADDBW. 764 

VPHADDD. 335 

VPHADDDQ. 766 

VPHADDSW. 337 

VPHADDUBQ. 770 

VPHADDUBW. 772 

VPHADDUDQ. 774 

VPHADDUWD. 776 

VPHADDUWQ. 778 

VPHADDW. 340 

VPHADDWD. 780 

VPHADDWQ. 782 

VPHMINPOSUW. 343 

VPHSUBBW. 784 

VPHSUBD. 345 

VPHSUBDQ. 786 

VPHSUBSW. 347 

VPHSUBW. 350 

VPHSUBWD. 788 

VPINSRB. 353 

VPINSRD. 356 

VPINSRQ. 358 

VPINSRW. 360 

VPMACSDD. 790 

VPMACSDQH. 792 
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VPMACSDQL. 794 

VPMACSSDD. 796 

VPMACSSDQL. 800 

VPMACSSQH. 798 

VPMACSSWD. 802 

VPMACSSWW. 804 

VPMACSWD. 806 

VPMACSWW. 808 

VPMADCSSWD. 810 

VPMADCSWD. 812 

VPMADDUBSW. 362 

VPMADDWD. 365 

VPMASKMOVD. 814 

VPMASKMOVQ. 816 

VPMAXSB. 367 

VPMAXSD. 369 

VPMAXSW. 371 

VPMAXUB. 373 

VPMAXUD. 375 

VPMAXUW. 377 

VPMINSB. 379 

VPMINSD. 381 

VPMINSW. 383 

VPMINUB. 385 

VPMINUD. 387 

VPMINUW. 389 

VPMOVMSKB. 391 

VPMOVSXBD. 393 

VPMOVSXBQ. 395 

VPMOVSXBW. 397 

VPMOVSXDQ. 399 

VPMOVSXWD. 401 

VPMOVSXWQ. 403 

VPMOVZXBD. 405 

VPMOVZXBQ. 407 

VPMOVZXBW. 409 

VPMOVZXDQ. 411 

VPMOVZXWD. 413 

VPMOVZXWQ. 415 

VPMULDQ. 417 

VPMULHRSW. 419 

VPMULHUW. 421 

VPMULHW. 423 

VPMULLD. 425 

VPMULLW. 427 

VPMULUDQ. 429 

VPOR. 431 

VPPERM. 818 

VPROTB. 820 

VPROTD. 822 

VPROTQ. 824 


VPROTW. 826 

VPSADBW. 433 

VPSHAB. 828 

VPSHAD. 830 

VPSHAQ. 832 

VPSHAW. 834 

VPSHLB. 836 

VPSHLD. 838 

VPSHLQ. 840 

VPSHLW. 842 

VPSHUFB. 435 

VPSHUFD. 437 

VPSHUFHW. 440 

VPSHUFLW. 443 

VPSIGNB. 446 

VPSIGND. 448 

VPSIGNW. 450 

VPSLLD. 452 

VPSLLDQ. 455 

VPSLLQ. 457 

VPSLLVD. 844 

VPSLLVQ. 846 

VPSLLW. 460 

VPSRAD. 463 

VPSRAVD. 848 

VPSRAW. 466 

VPSRLD. 469 

VPSRLDQ. 472 

VPSRLQ. 474 

VPSRLVD. 850 

VPSRLVQ. 852 

VPSRLW. 477 

VPSUBB. 480 

VPSUBD. 482 

VPSUBQ. 484 

VPSUBSB. 486 

VPSUBSW. 488 

VPSUBUSB. 490 

VPSUBUSW. 492 

VPSUBW. 494 

VPTEST. 496 

VPUNPCKHBW. 498 

VPUNPCKHDQ. 501 

VPUNPCKHQDQ. 504 

VPUNPCKHWD. 507 

VPUNPCKLBW. 510 

VPUNPCKLDQ. 513 

VPUNPCKLQDQ. 516 

VPUNPCKLWD. 519 

VPXOR. 522 

VRCPPS. 524 
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VRCPSS. 526 

VROUNDPD. 528 

VROUNDPS. 531 

VROUNDSD. 534 

VROUNDSS. 537 

VRSQRTPS. 540 

VRSQRTSS. 542 

VSHUFPD. 558 

VSHUFPS. 561 

VSQRTPD. 564 

VSQRTPS. 566 

VSQRTSD. 568 

VSQRTSS. 570 

VSTMXCSR. 572 

VSUBPD. 574 

VSUBPS. 576 

VSUBSD. 578 

VSUBSS. 580 

VTESTPD. 854 

VTESTPS. 856 

VUCOM1SD. 582 

VUCOMISS. 584 

VUNPCKHPD. 586 

VUNPCKHPS. 588 

VUNPCKLPD. 590 

VUNPCKLPS. 592 

VXORPD. 861 

VXORPS. 863 

VZEROALL. 858 

VZEROUPPER. 859 


w 


word 


xxxvi 

X 


x86. xxxvi 

XGETBV. 860 

XOP instructions. xxxvi 

XOP prefix. xxxvi 

XORPD. 861 

XORPS. 863 

XRSTOR. 865 

XSAVE. 869 

XSAVEOPT. 873 

XSETBV. 877 
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